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Research Model for Time-Separation 
Integrated Communication 


By H. E. VAUGHAN 


(Manuscript received March 11, 1959) 


A new communication system concept which is an important step toward 
an all-digital telecommunication plant is discussed. A research model, 
called ESSEX (Experimental Solid State Exchange), which combines remote 
line concentration, time-separation switching and PCM transmission is 
introduced to demonstrate the concept. The model, which uses solid state 
devices, works at the speed of a full-size system. 


I. INTRODUCTION 


A communication system requires channels for transmission of infor- 
mation and switching arrangements to interconnect the channels. At 
present, the transmission problem and switching problems are handled 
separately. Transmission channels may be divided into three classes: 
space separation, frequency separation and time separation. All have 
been in use for some time. Switching arrangements may be divided into 
the same three classes.' The space-separation class includes all tele- 
phone switching systems in use today. Frequency separation systems 
have been studied and are not economically feasible at this time. Ex- 
ploratory switching systems in the time-separation class are being con- 
sidered in this country and abroad. 

This paper reviews the use of time-separation techniques for trans- 
mission systems and for switching systems, points out that the avail- 
ability of solid state devices has revived interest in the subject and 
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indicates that techniques using these devices are now feasible for both 
systems. It shows that an integrated communication system using 
these techniques is much more attractive than a combination of sub- 
systems using time separation and presents a research model which 
demonstrates the technical feasibility of such an integrated system 

The research model is primarily a digital system. In it, information- 
producing terminals — in our specific case, telephone subscribers’ sets 

in a small group are connected over voice-frequency cable pairs to a 
small switching unit remote from the central switching point. This unit 
connects them to a time-separation or multiplex channel group and con- 
verts the signals to digital form. The digital signals are transmitted and 
switched at various locations and are reconverted to original analog form 
at another small remote switching unit, which serves another group of 
terminals, or at the originating switching unit. 

A laboratory model of this system has been built. It is known as 
ESSEX (Experimental Solid State Exchange) and, as its name implies, 
is built of solid state devices. 

The following sections discuss the background for the experiment, out- 
line the general plan, describe the laboratory model and give some re- 
sults. 


Il. BACKGROUND 


The common control type of switching system has two basic sections: 
(a) the switching network for interconnecting channels and (b) the com- 
mon control section. Many proposals and experiments have been made 


which use electronics for the common control section.?* They offer many 


advantages over the slower electromechanical control systems. Electronic 
components are at least one thousand times as fast as the mechanical 
ones now in use. Common control systems made of such devices? time- 
share the circuits, and thus can carry out their work with fewer devices. 
One control unit plus one spare can handle a very large switching system, 
and it can be made so that it is sufficiently flexible to handle new services. 

The development of an electronic common control system is sub- 
stantially complete. This system could be modified for use with a time- 
separation system of the type discussed herein; therefore, it will not be 
treated in this paper. The switching network is another story. 

Existing and most proposed networks are in the space-separation class. 
Large numbers of switches are required to implement them. Substitution 
of electronic switches for electromechanical switches may afford indirect 
saving in the control and reduce the cost of the switches, but it does not 
reduce the number of switches. Space-separation networks require many 
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switches or crosspoints per line. Time-separation switching networks, in 
which one physical path carries many conversations by time sharing, 
require something in the order of two switches per line. In addition, they 
require a few bits of memory per line to remember which switch is op- 
erated at what time interval. The switches, although fewer in number, 
must operate much faster than those used in a space-separation network. 
Present solid state switches are sufficiently fast that speed is no problem. 
Such a network can now be built. And a time-separation switching net- 
work is an important part of the research model. 

The switching system mentioned above is only part of a communica- 
tion system. It may represent about one-half of the cost. The other por- 
tion of the cost is for the transmission channels. In a telephone system, 
this is primarily copper cost, the cost of the cable pairs between sub- 
scribers and central office, and between offices. One way to reduce the 
amount of cable is to locate part of the central office in small pieces near 
to groups of subscribers.‘ These remote pieces of the switching network 
are called line concentrators. They provide switching so that a group of 
subscribers may share the use of cable conductors between the remote 
unit and the central switching point. The number of cable pairs required 
between the remote unit and the central point may be reduced by about 
80 per cent. Thus, a reorganization and dispersion of part of the switch- 
ing network can reduce the amount of cable required. Line concentration 
is another important part of the research model. 

Additional saving of cable conductors can be achieved by the use of 
either frequency-separation or time-separation techniques for interoffice 
trunks. Cost savings depend on the lengths of the cable runs and the 
cost of the terminal equipment for multiplexing. The advent of solid 
state devices is affording new opportunities for reducing the channel 
length needed to prove in multiplexed channels in place of individual 
space separated channels. One such method is time sharing of the cable 
conductors through the use of PCM (Pulse Code Modulation) transmis- 
sion. This is another important part of the research model. 

The parts mentioned above could be all considered and used in a com- 
munication system such as is shown in Fig. 1. In such a system voice- 
frequency signals would be time-switched at the concentrator and then 
changed back to voice frequency. For PCM transmission they would be 
converted to digital signals and then back again to voice at the central 
switching point; then again time-switched and changed back to voice, 
etc. This process is quite involved and, in fact, unnecessary. It can be 
simplified by removing all the transitions between time separation and 
space separation except those at the ends of the system, thus producing 
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a more economical system than that achieved by just building up sub- 
systems. Such simplification leads to the system shown in Fig. 2—the 
ESSEX System. Voice-frequency signals from a group of subscriber lines 
are switched at a remote concentrator unit and converted to digital sig- 
nals. The digital signals are transmitted, switched at one or more central 
switching points and handled as digital signals until they leave the sys- 
tem at another concentrator or trunkor, which is a converter for connect- 
ing to voice-frequency trunks. 


ESSEX employs all the parts mentioned above and combines them in 
a manner which minimizes the amount of equipment and cable con- 
ductors required and provides a uniform-quality fixed-loss path between 


any two voice-frequency terminals, independent of the distance and the 
number of switching points between them. 

The quality is fixed by the characteristics of the line filters, the am- 
plitude range of the system and the noise. The voice is coded in digital 
form, and the pulses are retimed and reshaped at regular intervals. Thus, 
in the ideal case, the code at the distant end will be the same as that at 
the transmitting end independent of distance. It follows that length of 
the transmission path has no effect on the loss and quality, except as 
increased length increases the probability of errors in transmission due 
to interference from external sources and to timing irregularities. ESSEX 
provides analog-to-digital conversion as near to the subscriber as feasible 
and then operates as a digital system. In addition, the switches at the 
central switching point are simplified, since they handle digital signals 
only. 

Time-separation techniques for transmission systems and for switching 
systems have been under investigation for many years. The potential 
advantages of an integrated digital communication system and increased 
demands for various speeds of digital transmission have spurred the ef- 
forts on the present research model. The use of solid state devices makes 
the system attractive. Such devices require less space and less power, 
and are fast enough to do the job. As the speed of the devices improves, 
more operations may be performed with the same number of devices or 
the same operations may be handled by fewer devices, and the picture 
will become even more attractive. 


III. BASIC PLAN 


The basic plan has a number of remote line concentrators, using time- 
separation switching and transmission, connected at a central point in 
a time- and space-separation switching network. Trunkors, which act as 
connecting and converting units for trunks, registers, ete., are similarly 
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connected. Each concentrator has some switching and control circuits 
located at the central switching point. The switches connect the four- 
wire transmission circuits from the concentrators to other concentrators, 
to trunkors or to routes to other offices. The control circuits have a 
memory which holds the information used to control both remote and 
central switches for the duration of a call. These circuits also control line 
scanning for supervision and handle programming for setting up and re- 
leasing calls. Each concentrator control unit is connected to the common 
control for the office. 

Before going further into the system, it will be well to mention the 
sampling and transmission action. It is a well-known fact that, if a short 
time sample of a signal limited in frequency to x/2 eps is taken x times 
per second, transmitted and filtered, then any signal components in the 
band up to «/2 eps will be reproduced at the output of the filter.* In 
ESSEX, the sampling rate has been set at 8000 per second. Thus, the 
period between one sample of a message and the next sample is 125 
microseconds, which is called a frame. The number of channels that can 
be inserted in a frame depends on the length of the sample and the guard 
space between samples. In ESSEX, each channel uses a time slot of 5.2 
microseconds, so 24 channels are handled in a frame. Twenty-three chan- 
nels are used for speech, and the 24th is used for supervisory functions. 
Each time slot has eight pulse positions, seven for the coded PCM 
signal and one for other functions. Thus, the pulse repetition rate on 
the four-wire digital transmission line is 1.536 me. The four-wire system 
will use ordinary exchange cable pairs. Pulse regenerative repeaters are 
required every 6000 feet for transmitting pulses at the 1.536-me rate in 
the case of 22-gauge paired cable. Closer repeater spacing may be re- 
quired if the noise is greater than that now anticipated. 


3.1 Remote Concentrator 


A concentrator module consists of a remote unit and a central unit, as 
shown in Fig. 3. Let us consider the remote part of the line concentrator, 
shown on the left side of the figure, which is the starting point in the 
system. A maximum of 255 voice-frequency lines appear as inputs, and 
three cable pairs carry digital signals to and from the central unit. One 
pair, designated S, the send pair, takes PCM signals to the central unit. 
The second, the receive pair, R, brings PCM signals from the central 
unit. And the third, the control pair C, brings control words from the 
memory in the central unit. Each line requires a line circuit, which con- 
tains a gate and a filter. The line circuit, a plug-in package, is the lowest- 


order module in the system. These modules are added only as customers 
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are connected to the concentrator. The ensemble of line-circuit packages 
makes up a two-wire bilateral switching stage controlled by a selector. 
The output of this stage is a two-wire time-separation PAM bus with 
23 time slots or channels for use as links to the central office. A time 
diagram which may be helpful in visualizing some of the operation is 
shown on Fig. 4. The memory which controls the selection of the gates 
is located at the central point. The information from it is sent over the 
control pair C as eight-bit words in each time slot. These words pass 
through the selector to control the line gates. Each word designates a 
line gate number (LGN) and can select one of 255 gates. 

The PAM two-wire bus must be converted to a four-wire bus so that 
the signals can be handled on a PCM basis. This conversion is accom- 
plished by a circuit called a time-division hybrid. In brief, it permits a 
signal to pass from a line to the send bus or from a receive bus to a line, 
but never permits a direct connection from send to receive. PAM signals 
on the send bus are coded into seven-bit PCM signals and sent to the 
central point. Incoming PCM signals are decoded and presented as 
PAM signals to the two-wire bus and then to the voice-frequency line. 
Note that the line circuit is a passive circuit and that all the signal power 
needed is supplied by the common receiving amplifier in the receive bus. 
Timing signals necessary for the operation of this remote unit are gen- 
erated by a local clock which is slaved to a master clock at the central 
switching point. Since both switching control and timing control signals 


originate at the central switching point, the remote unit is actually a 
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Fig. 4 Timing diagram for remote concentrator 


slave. The problems of timing and synchronizing both remote and 
central units will be discussed in Section 3.4 


3.2 Concentrator Central Unit 


The central unit of the concentrator module shown on the right side 
of Fig. 3 is made up of digital circuitry that includes the memory to 
control both remote and central switches, the central switches with their 
selector and a concentrator control unit. In addition, there is a delay 
pad and servo, which is discussed in Section 3.4. As mentioned above, 
the memory stores information which controls the operation of line gates. 
It also stores words to control the operation of the central switches 
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associated with a particular concentrator unit and call progress informa- 
tion concerning the states of the calls being handled. For example, ‘‘on- 
hook” and ‘off-hook” conditions are recorded in this memory. The 
memory stores 24 bits of information for each time slot or channel, 
eight bits for the line gate, five bits for the central switch, eight bits for 
‘all progress marks, and three bits for checking. Each 24-bit word is 
read out every 125 microseconds; thus, the complete memory can be 
searched in this period to determine which channels are busy or idle or 
for any other pertinent information. 

The central stage switches or junctor gates are simple digital AND 
gates which switch digital signals unilaterally. The switches handle low 
power, and thus the selector, which uses a five-bit input to mark one 
pair of 32 pairs of switches, is rather simple. The central switches for 
each concentrator are connected to the central switches of all other 
concentrators by junctors on a space-separation basis as in Fig. 5. Thus, 
each concentrator has access to all other concentrators, trunkors and 
junctors to other office modules over 32 separate paths in any of 23 
time slots. A call from one concentrator to another must use the same 
time slot in each concentrator. The switching plan is really a four-stage 
network, one stage at each remote unit and one for each central con- 
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centrator unit. Some blocking occurs due to the concentration to 23 
channels and some due to time slot mismatch. Although a remote unit 
xan handle up to 255 lines, about 115 lines, each submitting one tenth 
of an erlang, would load the 23 channels to 50 per cent of capacity. In 
the future, new services may use lines with much lower traffic, and then 
the large number of lines may be useful. If the call is to another module, 
the same number of stages are used, since only one switch is made in 
each central unit. The only difference is in the length of the junctor. 
Consider a call from a concentrator to a trunkor — a dial tone connec- 
tion — and assume for the moment that there is no delay in the system. 
Information in the memory for concentrator A opens a gate in time slot 
6 and opens the send and receive junctor gate pair 3 in the same time 
slot. Information in memory for the trunkor Z opens a gate in time slot 
6 in the trunkor and send and receive junctor gate pair 3. Information 
then passes directly from A, to Z, and from Z, to A¢ , and the operation 
is repeated 8000 times per second. 

An important section of the central concentrator unit is the con- 
centrator control, which works in cooperation with the common control. 
The division of responsibility between concentrator control units and 
common control is a field for further investigation, since the present 
division is based on judgment with limited knowledge of the problem. 
A detailed treatment of the organization and operation of the con- 
centrator control will be given in a future paper. A simplified diagram 
of the control section is shown on Fig. 6. The most complex part is the 
logic which controls the generation, interpretation and modification of 
‘all progress marks. Some of these marks are operating orders to and 
from the common control. Supervisory information from the remote 
unit is held in the memory, and logical operations are performed on this 


information when necessary. Control of ringing, supervisory tones and 


answer indications are also handled in this section. Line scanning also is 
controlled here. 


3.3 Supervision, Dialing and Ringing 


Many auxiliary functions must be performed in order to use this 
transmission and switching system as a telephone system. Detection of 
“on-hook” and “‘off-hook” line conditions to determine the subscriber’s 
wishes is done by scanning. The central control sends out a line designa- 
tion in the 24th time slot, which is reserved for this purpose. This eight- 
bit word controls a transistor in the line package through the selector 
used to control the line gate. A combination of the pulse from the 
selector and current flowing in the subscriber’s loop (‘‘off-hook”’ condi- 
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tion) produces a pulse on a lead common to all such transistors. Thus, 
scanning requires little additional equipment in a time separation 
system. If the line is “off-hook”, a “1” is returned to central in the 
eighth-bit position of the 23rd time slot on the S lead. Every fourth 
frame, a new number is sent to the remote unit, so that 255 lines are 
scanned in about one-eighth second. The result of the scan is stored in 
the call progress mark section of the memory if action is called for. 
Switching networks using electronic crosspoints have a limited power- 
handling capacity; thus, it is necessary to use low-level tones. Ringing 
is done by sending tones in the voice band to actuate a tone ringer in 
the subset.’ Ringing tone in the form of PCM signals is applied through 
a separate gate for each concentrator R lead, and audible ring in the 
same PCM form is applied through a separate gate for return to the 
originating end of the circuit (see Fig. 7). This arrangement permits 
full access on a time-separation basis to all 23 channels. It can be shown 
that this helps to reduce blocking. Busy tones or other tones in PCM 
form may be switched in the same way, and trunk splitting also can be 
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taken care of in this fashion. This simple means for applying special 
tones is one more advantage of digital time-division switching. 

Dialing signals are “in-band.’’ Frequency-shift dialing with one 
frequency for “make” and another for “break” is used, and a form of 
multifrequency or pulse-coded signals may be employed. Registers con- 
nected to trunkors will be used to interpret the digits which will be 
assembled in the memory of the main common control. This is so similar 
to dialing methods in other electronic switching systems that no further 
detail will be given here. 


3.4 Delay, Synchronization and Timing 


The synchronization of a point-to-point four-wire PCM transmission 
system is straightforward. A clock times the sending end, and the re- 
ceiving end is slaved to it. The same operation is used in the opposite 
direction. Synchronization between the two directions is not required. 
In the ESSEX system, which uses two-wire switching at the remote 
terminals operating under a central common control, over-all syn- 
chronization is necessary. It is a problem; unless all switches operate 
at the proper time, chaos will reign. The transmission delay, about 7 
microseconds per mile for cable pairs, further complicates the problem. 
This problem was analyzed by Karnaugh, who has offered several solu- 
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tions.’ One of these, the use of a delay pad, was adopted for use in the 
laboratory model. 

Consider the complete concentrator shown in Fig. 3, which illustrates 
the delay-pad solution to the problem. Assume that control words are 
stored in the memory and that PCM words appear on the junctor pair, 
going in both directions at time 7. Just before 7, an eight-bit word is 
sent out to the remote unit and the central switch. The central switch 
closes at + and permits the seven-bit word to go out over the R pair 
to the remote unit, so that it arrives there at + + a, where a is the 
transmission delay. The eight-bit word on the C pair controls the line 
gate, so that it opens at r + @ and the decoded sample that arrived on 
the R pair passes to the line. Now, in the opposite direction, a sample 
from the line passes to the coder and then to the S bus with some delay, 
d. The PCM representation of the sample arrives at the central switch, 
with an additional delay, a, at the time 


t=r+atd+t+a. 


Every 125 microseconds after 7, the five-bit word again closes the 
central switch, and, if r + 2a + d = 125 microseconds, the PCM signal 
would arrive just in time to pass through the junctor gate. However, a@ is 
dependent on the distance between the remote and central units, so the 
condition above is not satisfied. But it can be satisfied if a variable 
delay pad, x, is included in the send line, so that 


tT + 2a + d+ x = n(125 microseconds). 


A variable-length delay line provides the necessary delay x. Each con- 
centrator, trunkor and intermodule junctor must be padded with such 
a delay to provide proper operation. Since the transmission time, a, 
varies slightly with temperature, a servo unit on the delay line auto- 
matically compensates for these small changes in a. 

The clock at the remote unit is a erystal-controlled unit which is 
slaved to the master clock at the office module by monitoring the pulses 
on the C pair. Counting circuits are used to produce timing pulses at 
submultiples of the clock frequency. Once every 125 microseconds, a 
framing signal is sent in the 24th time slot on the R pair to the remote 
unit. When this signal is recognized, all counting circuits are checked, 
and, if out of frame, they are reframed. 


3.5 Modules 


The term ‘‘module” has been used in some of the preceding sections. 
It denotes a building block whose cost or complexity is significantly 
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greater than the cost or complexity of connecting it to the system. 
{SSEX has a hierarchy of modules. The smallest one is the line package 
used to switch voice-frequency lines selectively to a group of PAM 
time-separated channels. It is a plug-in piece of hardware added at the 
time a line is put into service. 

The next larger module is the concentrator, including both remote and 
central units. It is the basic multifunction unit in the model. The central 
office is built up of concentrators and trunkor modules and a common 
control unit. The whole switching and transmission array is made by 
running wires between such units. The proposal is to install sockets with 
junctor wires between them and to have the office grow by plugging 
switching units into these sockets. 

The system as outlined so far permits the operation of only one switch 
in a time slot at the concentrator. For heavy intraconcentrator traffic, 
it is desirable to operate two switches simultaneously so that only one 
time slot is required. In this case, speech is switched directly in PAM 
form and does not pass to the central point. One way to operate two 
switches in one time slot is to provide extra line memory in the con- 
centrator control, an extra control pair and an additional selector at the 
remote unit to control the second switch in a time slot. Such an arrange- 
ment could handle a maximum of 23 simultaneous calls between 46 
customers in one remote concentrator. A concentrator module with these 
additions could then serve as a community dial office (CDO) or as a 
PBX with centralized control. 

The largest module would be called the modular center. It would be 
made up of several concentrators and trunkors and use about 30 junctor 
pairs to serve between 2500 and 4000 lines, depending on the amount 
and characteristics of the traffic. Such a modular center could be located 
to minimize cable plant. Growth in an area could be handled by adding 
these central modules. For instance, a 10,000-line office would be made 
up of four modules, as shown in Fig. 8. These units might be intercon- 
nected by four-wire PCM junctors equipped with delay pads. With the 
present plan, each office module would have its own common control 
unit. Communication between the common control units in different 
modules would use the eighth-bit position of each time slot. A 192 
kilopulse per second channel is proposed for this purpose. Office modules, 
distributed over an area, might use a single common office code, the 
same office code for each one until 10,000 numbers are used. This would 
help to conserve office codes. 

The use of small central modules is only one way to handle central 
switching. Many valid arguments can be advanced to show that it is 
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wasteful of common control equipment. There are many plans to have 
one common control serve several office modules, but these are beyond 
the scope of this paper. 


3.6 Use for Other Services 


Digital data signals in the voice-frequency band may be handled 
through line circuits just as they are now handled. High-speed baseband 
signals could be applied to the PCM channels through switches at the 
output of the encoder, and incoming data pulses may be taken out 
through switches ahead of the decoder. Since each channel handles eight 
bits in a time slot, one channel will handle 64,000 bits per second. If 
higher data rates are needed, more channels can be allocated for this 
purpose. The complete group could handle 23 X 64,000 bits per second. 

Broader-band analog channels may be made available by changing 
the line package filter and by changing the sampling rate. If the address 
of a particular line terminal were stored in position n in the memory 
and again in position (n + 12) on a modulo 24 basis, the sampling rate 
for the line would then be 2 X 8000, or 16,000 times per second. If it 
were stored in positions n, (n + 6), (n + 12) and (n + 18) on a modulo 
24 basis, the sampling rate would then be 32,000 times per second. Thus, 


by using several time slots in an ordered sequence for one line, the 


sampling rate may be increased, and a wider band may be provided. 
This is another type of flexibility. 
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Fig. 8 — Interconnection of central switching modules. 
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IV. THE LABORATORY MODEL 


The prime objective of the experiment is to demonstrate the technical 
feasibility of the system. Such research system experiments increase 


understanding of the operation of the system, unearth new problems 


saused by interaction of complex circuits which have been demonstrated 
individually, and provide the stimulation to invent new circuits and 
techniques to solve these problems. 

Most research systems are highly skeletonized. This one is skeletonized 
only in the number of concentrators and trunkors. Two complete con- 
centrators and one complete trunkor, along with a central clock, make 
up the model. Although each of these modules is capable of handling the 
maximum number of lines or trunks, the number of operating-line 
packages is limited to 12 per unit. However, a plugboard arrangement 
is provided, so that each package may be associated with any terminal 
on the selector. 

Before discussing the model, it is appropriate to describe how it grew 
to its present state. 

Initially, two partially equipped remote concentrators with a central 
memory were connected together, with no delay between them. Each 
unit contained a two-wire PAM selective switching stage, a time- 
division hybrid or two-wire to four-wire converter, and synchronizing 
and timing circuits. Tests were made on this phase of the system until 
the PCM equipment became available. In the next phase, the trans- 
mission path was opened, and PCM coding and decoding equipment 
complete with compressors and expandors were installed. These units 
introduced some delay, so the delay pads were added. The latest phase 
builds the system up to include two complete concentrators, a complete 
trunkor and an operator console to simulate many of the functions of 
the common control. This gradual evolution of the model has made it 
possible for one phase of the evolution to provide most of the environ- 
ment for testing the circuits added in the next phase. It is a case where 
serial construction has saved a lot of work by reducing the amount of 
equipment needed to synthesize input-output gear that would have been 
needed to proceed in parallel on several parts at once. 


4.1 Layout of Model 


A layout of the laboratory model is shown in Fig. 9. The two racks 
at the left are a remote concentrator unit. The first rack contains the 
line selector, which takes the incoming serial eight bits of a word from 
the C lead, assembles them in parallel, selects a line gate and applies a 
sampling pulse to it. A group of 12-line gates is in the upper section of 
the rack, which also houses the plugboard that permits the interconnec- 
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tion of any package to any one of 255 selector terminals. The outputs of 
the gates are 23 time-separated channels on a two-wire PAM bus. This 
two-wire bus is connected to the second cabinet, which contains the 
time-division hybrid, encoders, decoders, synchronizing and timing 
circuits and other circuits common to the remote unit. The outputs from 
the second rack are three exchange-area type 22-gauge cable pairs, which 
handle digital signals at a 1.536-me pulse rate. These cable pairs require 
a pulse regenerative repeater for each 6000 feet. 

A second remote unit is located in the two racks at the right side of 
the figure. 

The remote units are the only places where analog signals are handled. 
Since each of these units represents a small part of an office, the crosstalk 
problem is simplified, because the exposure to other circuits is minimized. 
This is an important feature in the organization of “single-wire-to- 
ground” switching systems. 

The six racks in the center are all part of the office module. The third 
and fifth racks from the left are the control units for the two concen- 
trators. Each one contains the delay pads, the three memory units, 
control circuits and logic circuits for a remote unit. 

The sixth rack is the trunkor control unit, which is much the same as 
a concentrator control unit. The trunkor unit which converts from 
PCM to PAM and selectively switches voice frequency terminals for 
trunks, registers, etc., is located in the seventh rack. 

The fourth rack has room for the printed circuit cards for a central 
stage switch serving 30 concentrators or trunkors. Each card holds the 
central switches and selector for one concentrator. It is presently 
equipped with only three switch cards, one for each of the two con- 
centrators and one for the trunkor. 

The arrangement for the junctor wiring is shown in Fig. 10. The upper 
half of this rack represents the complete central switching network for a 
4000-line office module. The concentrator controls are connected by 
plugs, with only 420 wires being required to connect all 30 units, ex- 
clusive of power and clock pulses. The small size of this network demon- 
strates a major advantage afforded by PCM switching. Since the 
circuitry at the central module is all digital, the crosstalk in the wiring 
presents no formidable problem. The lower half of the fourth rack holds 
the master clock and timing circuits for the office module. Care must 
be taken in distributing timing pulses to the various control units to 
assure that timing pulses arrive at each one at the same time, plus or 


minus 10 millimicroseconds. This section also contains circuits to gen- 


erate the supervisory control tones. These tones are switched, when 
needed, to the concentrator R lead under control of call progress marks 
in its memory. 





THE BELL SYSTEM TECHNICAL JOURNAL, JU 





pigrieie! 


jal 





Fig. 10 — Junctor wiring at rear of connectors for central stage switches. 


A control console takes the place of the office common control. It 
provides a visual display of calling number, called number, time slot 
number, ete. An operator manually performs operations on the console 
to interpret instructions from the concentrator controls and to issue 
orders to them for setting up and taking down calls. The console is 
located in front of the group of racks and is used for testing and demon- 
strating the system. 

Fig. 11 is a photograph of a portion of the laboratory system. Power 


supplies for the system, not shown in the layout, are mounted in racks 


similar to the others. 
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Fig. 11 — Section of the laboratory system 


4.2 Circuitry 

Most of the circuitry is isochronous. Timing is furnished by a two- 
phase 1.536-me clock. Rise and fall times in the order of 50 millimicro- 
seconds are common. 


The circuits in the model use commercially available transistors and 


diodes in conventional arrays . High-speed operation is achieved by use 
of clipping and clamping techniques, with a collector supply voltage 
much higher than the normal signal voltages. Since the emphasis is on 
the system, the circuits are not necessarily minimal. They are assembled 
on 5 X 8-inch wiring boards, which plug into sockets for convenience 
in testing and replacement. The boards or packages contain groups of 
basic building blocks, such as diode logic units, flip-flops, shift register 
stages, pulse amplifiers and blocking oscillators. The laboratory model 
uses about 4000 transistors and 12,000. diodes. 

Most of the circuits use voltages in the range of —12 to +12 volts, 
with some collector supplies being as high as 25 volts. The total power 
drain for a remote concentrator is about 75 watts, with an additional 
% watt for each operating subscriber set. A concentrator control unit 
requires about 50 watts. 

Many unifunction circuits that probably arouse curiosity have been 
mentioned previously. The description of these would surely drag this 
paper too far into detail. It is planned to treat these details in two 
additional papers. However, it is unfair to leave the reader completely 
up in the air, so a few words are in order about some of these circuits. 

The subject of the line gates has been investigated for some time, and 
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has been published.® The time-division hybrid, a result of this experi- 
ment, is a circuit which permits a sample from a line gate to be passed 
to the “send” bus, held, stretched for coding, and then removed by a 
clamp so that there is no inter-time-slot crosstalk. Just after that sample 
passes through the common “send” gate and that gate is blocked, a 
common “receive” gate opens and passes an incoming PAM signal to 
the subscriber’s line filter through his line gate, which is still open. This 
PAM signal is then dissipated by the subset which terminates the line 
filter. Approximately 123 microseconds later, the “send” gating opera- 
tion is repeated. It is this time difference which provides the hybrid 
action, by preventing the receive signal from passing directly to the 
“send” bus. 

The delay pads and the memories use magnetostrictive delay lines 
with transistor drivers and amplifiers. A typical line is shown in Fig. 12. 
The line, a 3-mil supermendur wire, is mounted so that the delay may 
be set manually any place in the range from a few microseconds to 125 
microseconds. The servo unit for temperature correction is used only 
with the delay pads. It provides an automatic adjustment of +1.5 
microseconds. 

These lines have a wider bandwidth than those in common use. The 
pulses applied are baseband, and the pulse rate is 1.536 me. The total 
loss in the two transducers and the line is about 50 db. The drive circuit 
uses two transistors, and the receiving amplifier uses four transistors, 
and the line with this associated circuitry is a delay unit with zero loss. 


4.3 Performance 


The transmission characteristic of a channel between two voice- 


frequency terminals, exclusive of the subscriber loops, shows a loss of 
6 db + 0.5 db from 100 to 3200 eps and 3 db additional loss at 75 and 
3500 eps. A channel will handle up to 6 milliwatts of sine-wave power. 
The signal-to-noise ratio is about 30 db. As mentioned above, the char- 


acteristics are independent of the length of line between conversion 
points and of the number of switching points. 

Two research models of remote concentrators without controllers have 
been operating satisfactorily for more than six months. This part of the 
model uses about 1800 transistors and more than 5000 diodes. The 
performance of the components has exceeded all expectations for a 
research model. The model, as outlined, complete with controllers, 
trunkor and control console, has been operating for two months with 
equally satisfactory results. 

‘acilities are available for making listening tests to compare straight- 
through wire connections with the PCM connection, and only a few 





932 THE BELL SYSTEM TECHNICAL JOURNAL, JULY 1959 


people have been able to detect a difference between these conditions. 
The quantizing noise on signals seems to be unnoticeable. The low-level 
noise resulting from the indecision of the coders during silent periods 
seems to be more bothersome than the quantizing noise on the higher 
signals. 


V. CONCLUSIONS 


A communication system concept has been described which uses time- 
separation techniques for transmission channels and for switching. It is 
primarily a digital system. In principle, the use of digital transmission 
with regenerative repeaters would provide fixed low loss and fixed quality 
connections between remote line concentrators. It would provide flexi- 


bility for new services. It might be arranged in a modular manner to 
handle growth, and to facilitate manufacture and installation. A full- 
size Office of this type would require less floor space than existing electro- 


mechanical systems. A laboratory model using solid state devices 
throughout has been built and tested. It demonstrates the technical 
feasibility of the concept and gives an indication of the number of 
components that might be needed for such a system. 
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Variable-Length Binary Encodings 
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(Manuscript received September 9, 1958) 


This paper gives a theoretical treatment of several properties which de- 
scribe certain variable-length binary encodings of the sort which could be 
used for the storage or transmission of information. Some of these, such as 
the prefix and finite delay properties, deal with the time delay with which 
circuits can be built to decipher the encodings. The self-synchronizing prop- 
erty deals with the ability of the deciphering circuits to get in phase 
automatically with the enciphering circuits. Exhaustive encodings have the 
property that all possible sequences of binary digits can occur as messages. 
Alphabetical-order encodings are those for which the alphabetical order of 
the letters is preserved as the numerical order of the binary codes, and would 
be of possible value for sorting of data or consultation of files or dictionaries. 
Various theorems are proved about the relationships between these properties, 
and also about their relationship to the average number of binary digits used 
to encode each letter of the original message. 


I. INTRODUCTION 


Table I gives three different encodings for representing the letters of 
the alphabet and the space symbol in binary form. These encodings 
have several special properties which are of some interest. First, each 
is a variable-length encoding; that is, the code for each letter is a sequence 
of binary digits, but the codes assigned to different letters are not all 
required to consist of the same number of binary digits. The first two 
of these encodings have the prefix property; that is, no one of the codes 
is a prefix of any other code of the same encoding. This property makes 
it easy to decipher a message, since it is only necessary to look at enough 
binary digits of the message until it agrees with one of the codes if it is 
desired to find the first letter of the deciphered message. 

The first of these encodings, called the Huffman encoding, is con- 
structed by the method given by Huffman,' and has the property of 
being a minimum-redundancy encoding; that is, among all variable- 
length binary encodings having the prefix property, this is an encoding 
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TABLE I 


Letter Probability Huffman Code Alphabetical Code | Special Code 


Space . 1859 000 00 00 


i .0642 0100 0100 0100 

B .0127 011111 010100 010100 

C .0218 11111 010101 010101 

D .0317 01011 01011 01011 

1D) .1031 | 101 0110 0110 

F .0208 001100 011100 011100 

G .0152 011101 011101 011101 

H .0467 1110 01111 O1111 

I .0575 1000 1000 1000 

J .0008 0111001110 1001000 | LOOOLIII11 
K .0049 01110010 1001001 | 100100 

L, .0321 01010 100101 100101 

M .0198 001101 10011 10011 

N .0574 1001 1010 1010 

O .0632 0110 1011 1011 

P .0152 011110 110000 11000 

Q .0008 0111001101 110001 110001111111 
R .0484 1101 11001 11001 

S .0514 1100 1101 1101 

x .0796 0010 1110 1110 

[ .0228 11110 111100 111100 

\ .0083 0111000 111101 111101 

W .0175 001110 111110 111110 

X .0013 0111001100 1111110 V111101111111 
Y .0164 OO1111 11111110 | 1111110 

Z .0005 0111001111 11111111 11111101111111 


Cost 4.1195 4.1978 


having the lowest possible cost (where the cost is defined as the average 
number of binary digits used per letter of the original message, assuming 
that the message is made up of letters independently chosen, each with 
the probability given). 

The second of these encodings, called the alphabetical encoding, has 
the property that the alphabetical order of the letters corresponds to 
the numerical binary order of the codes. Among all such alphabetical- 
order-preserving binary encodings that are of variable length and have 
the prefix property, the one given has been constructed to have the 
lowest possible cost. It can be seen that the cost 4.1978 of the alpha- 
betical encoding is quite close to the cost 4.1195 of the Huffman encoding, 
as compared to the cost 5 of the more conventional fixed-length encoding 
for the same alphabet, so that the alphabetical restriction adds surpris- 
ingly little expense to a variable-length encoding. 

Part of this paper deals with the methods of constructing such best 
alphabetical encodings, and gives some theorems concerning their cost 
and their structure. However, this paper also includes theoretical results 
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about various properties of variable-length binary encodings in general. 
The cost, the prefix property and unique decipherability have already 
been mentioned. The exhaustive property (roughly speaking, this 
permits all infinite binary sequences to occur as encoded messages) is 
also shown to be relevant, as is the finite delay property, which has to do 
with the amount of delay which must take place between receiving and 
deciphering the enciphered message. Various theorems are proved con- 
cerning the relationships of these properties to each other and to other 
properties. Some of these properties have also been considered by other 
authors,! ?:3-4.5.6 

One property of special interest is the ability of certain variable-length 
encodings (but not of fixed-length encodings) to automatically syn- 
chronize the deciphering circuit with the enciphering circuit. This self- 
synchronizing property, while it has been previously mentioned, is a 
little-known property which might have practical significance in that it 
would permit binary deciphering machines using variable-length encod- 
ings to be built without requiring any special synchronizing circuits or 
synchronizing pulses, such as are needed for fixed-length encodings. 
Thus, there may be cases where (despite some present opinions to the 
contrary) variable-length encodings lend themselves to simpler instru- 
mentation than fixed-length encodings. 

Since the probabilities given in Table I are derived from one of the 
tables of frequencies of letters in English text,’ the encoding given 
should be reasonably efficient for encoding English words or phrases. 
The alphabetical property, together with the prefix property, implies 
that two such words or phrases could be compared for alphabetical 


order merely by putting the two entire phrases into a simple comparison 


circuit of the kind which would be used to compare binary numbers. If 
the two phrases begin with the same sequence of letters, the correspond- 
ing parts of their enciphered form would agree, and the outcome of the 
binary comparison would be determined by the comparison between 
the two binary codes corresponding to the first pair of letters which 
disagree. 

Placing the space symbol before the letter A of the alphabet corre- 
sponds to the usual convention governing the filing of multiple-word 
entries in alphabetical order, although if it were desired also to include 
punctuation marks or numerals in the alphabet, the conventions are not 
so universal, and might not be of the sort which can easily be expressed 
in a binary encoding. 

An alphabetical encoding might be used as a means of saving memory 
space needed for names or other alphabetical data that are to be sorted 
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into alphabetical order on a data-processing machine or are to be stored 
in a file in alphabetical order. Similarly, it might be used for the words of 
a dictionary as a part of a language-translating machine, if it were 
desired to preserve the conventional alphabetical order of dictionaries. 
In addition to possible savings of memory space, it might be used to find 
entries in such a dictionary more quickly. Since the low redundancy of 
this encoding causes the digits 0 and 1 to be used with more nearly equal 
frequency and more nearly independently than in a fixed-length encod- 
ing, the binary numerical value associated with each word would increase 
more nearly as a linear function of distance progressed through the 
dictionary; hence, instead of searching for a given word by the method 
of successively halving the interval in which it is known to lie, linear 
interpolation (or some rough approximation to it which might be done 
by a simpler circuit) could be used to speed up convergence. However, 
for uses such as mentioned here, the particular alphabetical encoding 
given in Table I is not necessarily the optimum, since the frequencies of 
occurrence of letters in names or in dictionary entries are undoubtedly 
different than they are in connected English text. However, the methods 
given in this paper would enable such an encoding to be obtained for 
any given probability distribution. 


Il. TERMINOLOGY 


We will use the word /etter to refer to any symbol of some designated 
list, including even the space symbol of Table I. By an alphabet we will 
mean a set of letters. We will usually require each member of an alphabet 
to have associated with it a probability of occurrence, and we will also 
usually require that some linear ordering relationship (which we will 


call alphabetical order) be defined for the letters of this alphabet. So that 


we may call any subset of the letters of an alphabet a subalphabet, and 
may keep the same ordering and the same probabilities, we will require 
only that the sum of the probabilities be less than or equal to one. All 
of the alphabets considered in this paper have only a finite number n 
of letters, but it might be advisable to allow countably infinite alphabets 
in certain further theoretical extensions of this subject. 

A message is a finite sequence of letters, or an infinite sequence L, LoL; 

- which extends infinitely only into the future, not into the past. We 
will consider a source which generates messages in which successive 
letters occur independently and with the given probabilities. However, 
in case the sum of the probabilities is less than one, we may imagine that 
the probabilities are proportionately increased just enough that their 
sum becomes one, so that the associated source is more realistic. 
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We distinguish between code and encoding, both of which are often 
valled codes by other writers. A code is a finite sequence of binary digits. 


An encoding is a way of associating (or more formally, a function C 
which associates) a code C; with each letter L; of an alphabet. 

The operation of enciphering (elsewhere often called encoding) con- 
structs a sequence of binary digits which is made up of the code for the 
first letter of the message, followed immediately by the code for the 
second letter of the message, etc. Any message then produces a sequence 


of binary digits called the enciphered message. Any machine or circuit 
which does the operation of enciphering is called an enciphering machine 
or an enctphering circuit. The enciphered message of a finite message is 
obviously always finite. 

An encoding will be said to be uniquely decipherable if, for each finite 
enciphered message, there exists exactly one original message which 
could have produced it. If an encoding is uniquely decipherable, then 
there is obviously a procedure for deciphering any finite enciphered 
message (by enumeration, for instance), and any machine or circuit 
capable of doing this will be called a deciphering machine or a deciphering 
circuit. 

Following Huffman,' we define a prefix of any sequence # of binary 
digits to be any finite sequence which is either itself or is obtainable 
by deleting all of the digits after a given point of &. For example, the 
prefixes of 10110 are 10110, 1011, 101, 10, 1, and the null sequence, 
which has no digits. We will say that an encoding C has the prefix 
property if no code of C is a prefix of any other code of C. 

By a presumed message we will mean a finite or infinite sequence ® 
of binary digits such that every prefix of @ is a prefix of the enciphered 
form of some message. Then, at any given time while a presumed message 
is being sent into a deciphering machine, it is indistinguishable from a 
message, so it makes sense to allow presumed messages as well as mes- 
sages to be the class of sequences which can be sent into a deciphering 
machine. 


Ill. THE ENCODING THEOREM FOR ALPHABETICAL ENCODINGS 


Consider a discrete source S which uses the alphabet: space, A, B, 
--+ , Z (any other linearly ordered alphabet will also serve). An encoding 
of blocks of N letters into binary sequences will be called an alphabetical 
encoding if it is uniquely decipherable and the codes for the blocks in 
alphabetical (dictionary) order are themselves in numerical order. Here 
the codes are imagined to be prefixed by binary points to convert them 
into numbers in binary form. The alphabetical encoding of Table I is a 
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case with N = 1. It is a natural question to ask if a restriction to alpha- 
betical encodings may not be severe for some sources S. In particular, 
are the results of Shannon’s encoding theorem (Ref. 8, Theorem 9) 
still obtainable with alphabetical encodings? 

Shannon proved that the output of a discrete source having entropy 
H bits per character can be enciphered in a uniquely decipherable manner 
into a sequence of binary digits so that the average number of digits 
used per character exceeds H by an arbitrarily small amount. Shannon’s 
construction encodes blocks of N source characters into binary sequences, 
using a cost (average number of binary digits per character) Hy which 
satisfies 


NGy S NHw S NGy + 1. (1) 


Here, NGy is Shannon’s notation for the information contained in a 
block of N characters produced by the source; i.e., 


NGy = — >, p; log pi, (2) 
1 
in which the p; are the N-gram probabilities of the source. Then, since 


lim Gy = H, 


Shannon’s theorem alias 
, ’ : 
lim Hy = H (3) 
N72 


follows from (1). Since NGy must be a lower bound on the average 
number of digits used to encode a block of N characters by any means 
whatever, (1) shows that Shannon’s construction is not far from the 
best possible one for block encoding. We now give a similar theorem 
for alphabetical encoding. 

Theorem 1: Let S be a source producing messages which may be ordered 
(alphabetically). Let Gy be computed from the N-gram probabilities p; of 
S by (2). There exists a uniquely decipherable alphabetical encoding of 
blocks of N characters of S into sequences of binary digits for which the 
cost, Hy , satisfies 


NGy S NHy S NGy + 2. (4) 


By picking N large enough, Hy may be made arbitrarily close to the entropy 
H of S in bits per character. 

Proof: The proof is adapted from Shannon’s’ proof of his Theorem 9. 
Let all possible blocks of N source characters be listed in alphabetical 
order, and let p; denote the probability of the ith block in the list (recall 
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that Shannon lists his blocks in order of probability rather than alpha- 
betically). Let m; be the integer for which 
toe. 2. 
Also, let numbers A; , Az, A3, «++ be defined by 
A, = 2 


=? 


Ag = D + ., 


Ay = (pi t+ °** + pea) + = 
Note that 


0s A, 


IIA 


A, 


IIA 
IIA 


1. 


We now construct an alphabetical encoding. The code for the ith block 
will be the first m; + 1 digits of the binary expansion of the number A; . 
In Shannon’s encoding this same block has a code formed by expanding 
a (different) number to m; places. Then our scheme uses only one more 
digit than does Shannon’s for each block, NHy = NH, + 1, and (4) 
follows from (1). It remains now to show that our encoding is uniquely 
decipherable; i.e., that the sequence of letters generated by S may be 
reconstructed from the binary digits. 

It suffices to prove that our construction produces a list of codes which 
have the prefix property. Then the enciphered message produced by 
each block of N letters may be deciphered as soon as all its digits have 
been received. 

To prove that our list has the prefix property, consider any two blocks 
of letters, say the 7th and the jth with i < j. By (5), 


D5 Di 
A;s2 Apt R+®, 


~ 


and 
A;2A:+2°%+27°™. (6) 


If pi S p;, then m; = m;; but, by (6), the jth code cannot be identically 
the same as the first 1 + m; places of the 7th code. Similarly, if p; = p,; 
the ith code cannot be a prefix of the jth code. Thus, the prefix property, 


and the theorem, follow. 
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Except in the case of an alphabet having only one letter, the prefix 
property is sufficient to insure unique decipherability, but it is not 
necessary. For example, the list 0, 01, 11 does not have the prefix prop- 
erty; still it could be used. In a received message 00001111 --- there 
would be no doubt about the first three 0’s, and the fourth 0 would be 
recognized as 01 or not according to whether an odd or even number of 
1’s followed it. 

However, by a best alphabetical encoding we will mean an encoding 
which has the lowest cost among all alphabetical encodings which have 
the prefix property. This insistence upon the prefix property will make 
it possible for us to prove Theorems 2 through 5 and give constructive 
methods for finding these best alphabetical encodings. 

If we use the construction just described to design an alphabetical 
encoding of English with N = 1, we obtain a cost of 5.75 digits per 
character. As guaranteed by the theorem, this cost is less than G, + 
2 6.08. However, we could have done better by simply assigning : 
five-digit code to each letter. The encoding can be much improved by 


TABLE II] 


Letter Code Shortened Code 
Space 0001 000 

A 00110 001 

B 01000001 010000 

C 0100011 010001 


deleting some digits which are obviously not needed. For example, the 
first few codes are those listed in Table II. Clearly the code 00110 for A 
is too long. As soon as the prefix 001 is received, A is the only possibility. 
The final digits 10 may be deleted. Similarly, the other codes may be 
shortened, as indicated in Table II, until no code can lose a final digit 
without becoming a prefix for some other code. The cost is thereby 
reduced to 4.44 digits per character. 

A different encoding is obtainable using the same sort of construction 
but with 

i—l 


Q-™j g-mi-t 

A; = 2 ae. 
The same proof can be used, since (6) still holds. Since the code lengths 
are again the numbers m; + 1, the new encoding will have the same cost. 
The numbers A; can now be computed with ease directly in the binary 
system, and much of the arithmetic needed for the first construction 
may be avoided. However, the kind of shortening used in Table II does 
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not work as well with the new encoding. All codes (as numbers) are now 
less than 

Len. 

i 
This number need not be near 1 (typically it is about 2). The codes are 
then cramped together in a range smaller than (0,1) and cannot be 
shortened as much. For the case of the English source with N = 1, the 
new encoding can only be shortened to cost 5.02 digits per letter. 


IV. ENCODING TRICKS 


The simple construction just given does not produce the best encoding, 
i.e., the one with least cost. The best encoding can always be found by 
a systematic, although long, calculation which is described in the next 
section. Here we list a few tricks whereby the problem of finding the 
best encoding may be simplified and, in some cases, solved. 

We will describe these results in terms of encoding single letters into 
binary form; however, it is to be understood that blocks of N letters 
may always be considered the single letters of a larger alphabet. By a 
prefix set of an encoding we will mean the set of all letters which have 
codes beginning with a given prefix. For example, in the Huffman encod- 
ing of Table I the prefix 011 has the prefix set consisting of letters B, G, 
J, K, P, Q, V, X and Z. In an alphabetical encoding every prefix set 
must consist of all letters lying between some two fixed letters in the 
alphabet. 

The tricks to be described enable one to prove that certain collections 
of letters must be prefix sets in any best alphabetical encoding. Whenever 
a prefix set is known the encoding problem can then be reduced as follows 
to one for a smaller alphabet. 

Theorem 2: In a best alphabetical encoding let S be a prefix set for a 
prefia m. Construct a shorter alphabet by replacing the letters of S by a 
single new letter, L', occupying their place in alphabetical order and having 
as its probability the sum of their probabilities. A best encoding of the new 
alphabet gives L' the code x and gives every other letter its old code. 

Proof: Let C(L) denote the code for letter L in the original best 
encoding. Suppose, contrary to the theorem, that the new problem had 
a better solution in which L, L' had codes C'(L) and C'(L'). One would 
then obtain a better solution of the original problem by encoding L into 
C'(L). The code for a letter M in the prefix class would be C(M) with 
the prefix + changed to C'(L'). 

Huffman’s encoding scheme uses a result similar to Theorem II for 
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nonalphabetical encodings. The two letters of lowest probability must 
form a prefix set, and his result is used again and again, until there are 
only two letters left and the problem is solved. When the encoding must 
be alphabetical one cannot always find a prefix set easily. Some results 
in this direction are given by the following theorems. The symbols 


L, , Ly, +--+ are used to represent the letters of the alphabet in order; 
Pi, Po, -*: Will be their probabilities; C(1:), C(Le2), --+ will be their 
codes in the encoding C and N,, Ne, «++ will be the numbers of binary 


digits in their codes. Also, if ® is any code or any prefix, N(#) will be 
used to represent the number of binary digits in ®. 

An encoding will be said to be exhaustive if it encodes an alphabet of 
two or more letters in a uniquely decipherable manner and, for every 
infinite sequence x = 2,22; --- of binary digits, there is some message 
which can be enciphered as x; or if it encodes an alphabet of one letter 
by using the null sequence. 

Theorem 3: Every best alphabetical encoding is exhaustive. 

Proof: Consider an encoding of an alphabet having two or more 
letters which is alphabetical and has the prefix property, but is not 
exhaustive. It will be shown that it is not a best encoding. Let x be an 
infinite sequence of binary digits such that no message can be encoded 
as x. If any code of the encoding is a prefix of x, remove it from x, and, 
after a finite number of repetitions of this process, an x will be obtained 
which has no one of the codes for a prefix. Let @ be the greatest prefix of 
x which is also a prefix of any one of the codes. Let C; be some code of 
which # is a prefix. We will use #0 to represent the sequence © followed 
by 0. Then either 60 is a prefix of C; and #1 is a prefix of x, and #1 is 
not a prefix of any code of this encoding; or else #1 is a prefix of C; and 
0 is a prefix of 2, and £0 is not a prefix of any code of this encoding. 
Without loss of generality, we assume the second one of these alterna- 
tives. Then consider the new encoding which agrees with the old one 
for all codes not having ® as a prefix, but which has a code 4 in place 
of each code of the form 16. The new encoding has a lower cost than 
the old one, is still alphabetical and still has the prefix property. Hence 
the original encoding was not a best alphabetical encoding. 

Lemma 1: Let x be a prefix. In a best alphabetical encoding, if there is a 
code with prefix 10 there is one with prefix w1. Conversely, if there is a 
code with prefix 1, there is one with prefix 10. 

Proof: If +O is a prefix, then by Theorem 3 the sequence 7111 
must have some code C; as a prefix. But by the prefix property, C; 
cannot be a prefix of 70; hence, C; has prefix 71. The converse is proved 
similarly. 
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Lemma 2: Let La be the letter of lowest probability. In a best alphabetical 
encoding, La , together with one of Lay; or La, must form a prefix set. 

Proof: Suppose C(L,) ends in 0, say C(La) = 20, where mw stands for 
some prefix. By Lemma 1, wl is a prefix of C(Lay1). If Chay) = xl, 
we have the desired result. If not, +10 must be a prefix of C(La4:). 
By Lemma | there exist codes with prefix 11. A better encoding (and 
hence a contradiction) may be had by the following changes: Lengthen 
C(La) from 70 to 700. Change all codes of the form rl0y to r01y. Shorten 
all codes of the form rlly to rly. Since the last change applies to at 
least one letter (of higher probability than L,), there is a net decrease 
in cost. 

The proof in the other case [(C(L.) ending in 1) is similar. If, as is the 
case of the probabilities of Table I, the least probable letter is at the 
end of the alphabet, then this letter has only one neighboring letter and 
must form a prefix set with it. Thus, as a first step in Table I, we can 
write 

C(Y) = r(Y,Z)0, 
C(Z) = #(Y,Z)l, 


where 7(Y,Z) is some unknown prefix. Then, using Theorem 2, the prob- 
lem is reduced to an encoding for a 26-letter alphabet in which Y and Z 
have been replaced by a single letter L(Y,Z) of probability 0.0169. 
When this new problem is solved, #(Y,Z) will be found as the code for 
L(Y,Z). The new least probable letter is J or Q, both with the same 
probability 0.0008; J, for example, can be in a prefix set with either I 
or K, but Lemma 2 gives no clue for deciding which one. One might 
hope that one can always pick the less probable neighbor, KX in this 
case. However, it is easy to find counter-examples which disprove this 
conjecture. A weaker, but true, theorem is the following one. 
Theorem 4: Let La be the letter of lowest probability. Suppose that 


Patt > Pa + Pa-i- (7) 


Then La and Lo, must form a prefix set in any best alphabetical encoding. 
Similarly, if Par > Pa + Pasi, La and Lay, must form a prefix set. 
Proof: Suppose (7) holds but that L, and L,_; do not form a prefix 
set. Then, by Lemma 2, L, and La, form a prefix set. The codes for L, 
and La,; must be of the form 
C(L.) = 20, 


C(Lex1) = wl 


for some prefix . The code C(L,-1) must end in 1, say C(Le-1) pl. 
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For, if C(Le-1) were p0, Lemma 1 would show that some code has prefix 
pl and hence must stand for a letter between L,-; and ZL, in the alpha- 
betical order, an impossibility. Lemma 1 now shows that some other 
letters have prefix pO. 

We consider two cases determined by the numbers N (7) and N(p) of 
digits in and p: 

Case 1 — N(mr) < N(p). An improved encoding can be made by changing 
C(L,) from 70 to 701, C(Le_1) from pl to 700 and all codes of the form 
p0w to py. The last change, a shortening, affects some codes and so off- 
sets the lengthening of the least probable code. 

Case 2—N(p) S N(x). An improvement can be made by shortening 
C(Las1) from 1 to x while changing C(L.) from 70 to pll and C(L._;) 
from pl to p10. That there is a net decrease in cost follows from (7). 

The other half of the theorem is proved in a similar way. 

Applying Theorem 4 to our reduced problem of Table I, we obtain 
further reductions, producing new letters L(J,K) and L(P,Q) with prob- 
abilities 0.0057 and 0.0160. Now the lowest-probability letter has become 
X, and we need another kind of theorem. 

Theorem 5: If L; and L; (i < j) are two letters both of probability ex- 
ceeding Pisi + Pisa +... + pj-r, then the intervening letters Lisi , Lix2 , 

, Lj-1 form a prefix set in any best alphabetical encoding. 

Proof: Let denote the greatest common prefix of C(L;) and C(L;), 
i.e., a prefix such that 70 is a prefix of C(L,) while 71 is a prefix of C(L;). 
The intervening letters have either 70 or wl as prefixes. Supposing that 
there are some intervening letters with prefix 70, we assert that the 
intervening letters with prefix 0 form a prefix set. To prove this assertion, 
let the intervening letters with prefix 70 be Lisi, ...Z., where C(L.41) 
has prefix rl. Let 0p denote the greatest common prefix of C(L;) and 
C(L,). Then C(L,) must have prefix 70p1; otherwise, by Lemma 1, L.4; 
would have prefix w0p, and hence 70. Also, C(L;) has prefix 20p0; other- 
wise, t0p1 would be a greater common prefix than 70p. The assertion re- 
quires only that we prove that C(Li,:) has prefix w0p1, for then the 
letters in question and no others have this prefix. If, on the contrary, 
C(Li41) has prefix 70p0, find the greatest common prefix r0p0c such 
that 20p0c0 is a prefix of C(L;) and r0p0c1 is a prefix of C(Lis1). Now 
shorten all codes of the form x0p0cOy to rO0p0cy and lengthen all 
other codes r0py to r0ply. The shortened codes include the one for 
L;, which has more probability than the total probability of all the 


lengthened codes. The assertion is now proved, and likewise intervening 
letters with prefix 1 form a prefix set. 
By our two assertions, each of C(Li4,:), ..., C(L;-1) has one of two 
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prefixes, which we may call x0p1 and 170, while x0p0 is a prefix of 
C(L,) and wir is a prefix of C(L;). Again, one proves the theorem by 
making changes which put the intervening letters into a single prefix 
set. There are two cases: 

Case 1 — N(r0p) S N(rlr). Lengthen codes r0ply to r0p10y. Change 
codes r170y to r0p11y. Shorten all codes rlrly to rlry. The intervening 
letters now form a prefix set with prefix r0p1 and the new encoding has 
smaller cost. 

Case 2— N(rlr) S N(x0p). By changes similar to those of Case 1, one 
may reduce the cost by making the intervening letters into a prefix set 
with prefix 170. 

Applying Theorem 5 to Table I, we now recognize new prefix sets and 
reduce the problem by introducing new letters L(F,G) and L(U,V, 
W,X,Y,Z) of probabilities 0.0360 and 0.0668. Now L(J,K) becomes 
the least probable letter, Theorem 4 applies, and we form a new letter 
L(J,K,L) of probability 0.0378. Next, Theorem 4 applies to letter B, and 
we form a new letter L(B,C) of probability 0.0345. Again we are at an 
impasse. 

Theorem 6: If px < ps, then L, and Lz form a prefix set in any best 
alphabetical encoding. Similarly, if L, is the last letter of the alphabet, 
L,-. and L,, must form a prefix set if Pa < Dn—2 . 

Proof: If p, < p3 and L, and L, are not a prefix set, then C(1,), C(L2) 
and C(L3) may be shown to have the forms 70, rlp0 and rlply. Then 
one could improve the encoding by changing C(L,) to 700, C(L2) to 
01 and all codes rlply to rlpy. 

This theorem provides no further reduction of our example. Note, 
however, that it might have been applied following the creation of 
L(Y,Z) to prove that X,Y,Z, forms a prefix set. This information is 
helpful when we must add the final digits to the prefix r(U,V, ..., Z) 
to form the codes for U, ..., Z. Using Huffman’s encoding method, we 
find, disregarding questions of alphabetical order, the best way of en- 
coding four letters which have probabilities in the same ratio as our 
letters U,V,W and L(X,Y,Z). The solution gives each letter two digits. 
Then, an equally good alphabetical encoding gives these letters the code 
00, 01, 10, 11. We now know parts of the codes souglit, as summarized 
in Table III. The unknown prefixes 7(B,C), ... are'to be determined 


by finding a best alphabetical encoding of the 17-letier alphabet listed 
in Table IV. 

Again we might try a Huffman encoding for Table IV. However, we 
note in advance that M and L(P,Q) are much less probable than their 
neighbors. Then a Huffman encoding will give these letters such long 
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TABLE III] 


Code 


r(B,C)0 
r(B,C)1 
r(F.G)O 
nr(F,G)1 
r(J,K,L)00 
r(J,K,L)O1 
r(J,K,L)1 
r(P,Q)O 
r(P,Q)1 
r(U, ---, Z)00 
r(U, ---, Z)O01 
r(U, ---, Z)10 
r(U, «++, Z)110 
r(U, ---, Z)1110 
m(U, +++, Z)111 


codes that there will be no alphabetical encoding which uses the same 
length codes for every letter. To circumvent this difficulty we use Lemma 
2, first on L(P,Q) and next on M, and conclude that L(P,Q) must form 
a prefix set with O or R and M must form a prefix set with L(J,K,L) or 
N. There are then four new alphabets to consider, and we have con- 
structed Huffman encodings for each one. The one with smallest cost 
is the one in which J,K,L,M and P,Q,R were made into new letters. 
The numbers of digits for the letters in Table IV which this Huffman 
encoding required are listed. We next look for an alphabetical encoding 


in which the same numbers of digits is used. Such an encoding actually 


TABLE IV 


Letter Probability Number of Digits 


Space 0.1859 

! 0.0642 
L(B,C) 0.0345 
D 0.0317 

1D 0.1031 
L(F,G) 0.0360 
H 0.0467 

I 0.0575 
L(J,K,L) 0.0378 
M 0.0198 

N 0.0574 

O 0.0632 
L(P,Q) 0.0160 
R 0.0484 
0.0514 

0.0796 

0.0668 


Tore RO 


- 


wee ee ot Ot ot Ot 
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exists, and so we obtain the best alphabetical encoding shown in Table 
I. It must be admitted that we were somewhat lucky to be able to reduce 
the problem to one in which one of the best possible encodings, disre- 
garding alphabetical order, includes an alphabetical encoding. Undoubt- 
edly, minor changes in the probabilities in Table I might make the prob- 
lem much harder. In the next section we give an encoding method which 
will apply in all cases. 


V. THE GENERAL ALPHABETIZING ALGORITHM 


The method which will be used in general builds up the best alpha- 
betical encoding for the entire alphabet by first making best alphabetical 
encodings for certain subalphabets. In particular, the subalphabets 
which will be considered will be only those which might form a prefix 
set in some alphabetical binary encoding of the whole alphabet. Since 
only those sets of letters consisting exactly of all those letters which lie 
between some pair of letters can serve as a prefix set, we will call such a 
set an allowable subalphabet. 

We will denote the allowable subalphabet consisting of all of those 
letters which follow L; in the alphabet (including L; itself) and which 
precede L; (again including L; itself) by (L; , L;). When referring to the 
ordinary English alphabet of Table I we will use the symbol * for the 
space symbol. Thus, ( # ,B) will be the subalphabet containing the three 
symbols space, A and B, and (A,A) will be used to denote the subalpha- 
bet containing only the letter A. 

If it were desired to find an optimum encoding satisfying certain kinds 
of restrictions other than the alphabetical one, different allowable sub- 
alphabets could be used, with the rest of the algorithm remaining analo- 
gous. This method of building up an encoding by combining encodings 
for subalphabets is analogous to the method used by Huffman,' except 
that he was able to organize his algorithms such that no subalphabets 
were used except those which actually occurred as prefix sets in his 
final encoding. However, we consider all allowable subalphabets, in- 
cluding some which are not actually used as part of the final encoding. 

The term cost of an encoding has been used to refer to the average 
number of binary digits per letter of transmitted message, that is, 
>=: piN; . Since, in the algorithm to be described, we will be construet- 
ing an encoding for each allowable subalphabet, we will also use the 
corresponding sum for each subalphabet. But, since the probabilities p, 
do not even add up to 1 for proper subalphabets, the sum >>; p.V,; does 


not correspond exactly to a cost of transmitting messages, and so the 


corresponding sum will be called a partial cost. 
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The algorithm to be described takes place in n stages, where n is the 
number of letters in the alphabet. At the kth stage, the best alphabetical 
binary encoding for each k-letter allowable subalphabet will be con- 
structed and its partial cost will be computed. For k = 1, each subalpha- 
bet of the form (L; ,L,) will be encoded by the trivial encoding which 
encodes L; with the null sequence; it has cost 0, since the number of 
digits in the null sequence is zero. For k = 2, each subalphabet of the 
form (L; , Li,1) will be encoded by letting the code for L; be 0 and the 
code for Lis; be 1. The partial cost of this encoding is p; + pi4i:. In 
general, the kth stage of the algorithm, in which it is desired to find the 
best alphabetical binary encoding for each subalphabet of the form 
(L; ,Lisx-1) and its partial cost, proceeds by making use of the codes 
and the partial costs computed in the previous stages. 

For each j between i + 1 and i + k — 1, we ean define a binary 
alphabetical encoding as follows: Let C; , Cis, , . . . Cj-1 be the codes for 
L;, Lisa, ... Lj-1 given by the (previously constructed) best alphabeti- 
cal encoding for (1; ,L;-:), and let C; , Cas eee Fe be the codes for 
Lj, Lisi, ..., Lise given by the (previously constructed) best alpha- 
betical encoding for (L; , Li,x-1). Then the new encoding for L; , Lix:, 

, gay Eyy Daa, «+5 Deeps a Be Oi, Clas, 6.4 Oa, 1, 
1Cs41, ..., 1044-1. Such an encoding can be defined for each j, and 
the encoding is exhaustive. It follows from Theorem 2 that the best 
encoding for this subalphabet is given by one of the k — 1 such encod- 
ings which can be obtained for the k — 1 different values of 7. The 
partial cost of such an encoding made up out of two subencodings is the 
sum of the partial costs of the two subencodings plus p; + piyi +... + 
Pisze-1- To perform the algorithm it will not be necessary to construct 
all of these encodings, but only to compute enough to decide which one 
of the k — 1 different encodings has the lowest partial cost. This is 
done by taking the sums of each of the k — 1 pairs of partial costs of 
subencodings and constructing the best encoding only. 

After the kth stage of this algorithm has been completed for k = 1, 
2,..., , the final encoding obtained is the best alphabetical encoding 
for the entire original alphabet, and the final partial cost obtained is the 
cost of this best alphabetical encoding. 

If the above algorithm were performed on a digital computer, the 
length of time required to do the calculation would be proportional to 
n*. The innermost inductive loop of the computer program would per- 
form the operation mentioned above of computing sums of pairs of 


partial costs, and this would be done k — 1 times in the process of en- 


coding each one of the subalphabets considered in the kth stage. But, 





VARIABLE-LENGTH BINARY ENCODINGS 949 


since there are n — (k — 1) different allowable subalphabets to be en- 
coded in the kth stage, there are (k — 1) [n — (k — 1)] steps to be done 
in the kth stage. To find the total number of operations done in all of 
the stages, we sum, and find that 


n 3 
> (k -— 1)[n — (kK -1)] = (n_— 2) 
k=1 


6 ; 


which is an identity which can be verified by mathematical induction. 


VI. PROPERTIES OF EXHAUSTIVE ENCODINGS 


We have already shown (Theorem 3) that every best alphabetical 
encoding is exhaustive. Another reason for considering exhaustive en- 
codings to be of some general interest is given by the following theorem. 

Theorem 7: The Huffman binary encoding of any alphabet is exhaustive. 

Proof: We prove by induction that each of the encodings for prefix 
sets arrived at during the steps of the algorithm of Huffman! is an ex- 
haustive encoding. If this holds for the first k encodings constructed 
during this algorithm, consider the prefix set L encoded at the (k + 1)th 
step. Let « = ax rer; ... be any infinite sequence of binary digits. It 
suffices to show that there is some letter whose code is a prefix of x. 
The set L was made by combining two previous prefix sets of letters, 
L' and L”’, and it was encoded by prefixing the codes from their previous 
encodings by 0 and 1 respectively. Let L’ be the set whose codes were 
prefixed by z, . Then if L’ is a single letter, x, is its code, and hence its 


code is a prefix of x. But if L’ is a prefix set, then its previous encoding 
is exhaustive by inductive hypothesis, and hence there is a letter L’”’ 


vt? 


whose previous code is a prefix of xar3.... Then the new code for L 
is a prefix of x. 

Several of the properties of exhaustive encodings will be considered, 
since both the Huffman encoding and the best alphabetical encoding 
are exhaustive, and it seems likely that exhaustive encodings might 
arise from other types of optimizing problems. For instance, the short- 
ening procedure used in Table II was essentially a way of making the 
encoding more nearly exhaustive. 

Lemma 3: Whenever an encoding C has the property that for any infinite 
sequence & = 2X4XQt3... there is a code of C which is a prefix of x, then 


n 


ze ws, 


i=1 


and equality holds if and only if C has the prefix property. 
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Proof: Consider the set P of all finite sequences x having length exactly 
k, where k is some fixed integer longer than the longest code of C. Then 
the property assumed in the hypothesis implies that each element of P 
has at least one of the codes for a prefix. But P has exactly 2‘ elements, 
and for each code of length N; there are 2“** elements of P of which 
it is a prefix. Hence, 


n 


ots 
| * 

which is equivalent to (8), and equality holds if and only if no element 

of P has two different codes for a prefix. However, the occurrence of 

two different codes which are prefixes of the same sequence is exactly 

equivalent to having one of the two codes be a prefix of the other. 
Theorem 8: Every exhaustive binary encoding has the prefix property and 

satisfies 


ae e (9) 


‘=] 


Proof: By Lemma 8 and the definition of exhaustive, (8) holds, but, 
by MeMillan,* unique decipherability implies 


> te ide BP (10) 


Then we combine (8) and (10) to obtain (9). But, by Lemma 3, this im- 
plies the prefix property. 

Lemma 4: For any exhaustive encoding of an alphabet, and any prefix 
& of this encoding, the new encoding of the prefix-set subalphabet which as- 
sociates the new code @ with each letter whose original code was 6 is an 
exhaustive encoding of this subalphabet. 

Proof: Given any x, to find a letter whose new code is a prefix of « we 
consider the letter LZ whose original code was a prefix of @x. Then, by 
the prefix property, the original code of L cannot be a prefix of &, and 
thus the original code of L is of the form 6. Hence, L is in the subalpha- 
bet, its new code is 6, and @ is a prefix of «. To complete the proof that 
the new encoding is exhaustive, note that it has the prefix property be- 
cause the original encoding does. Hence, the new encoding is either the 
trivial encoding (of a one-letter alphabet) or is uniquely decipherable. 

Lemma 5: For any exhaustive binary encoding of an alphabet having 
n letters, the total number of prefixes is 2n — 1. 

Lemma 6: In any exhaustive binary encoding of an alphabet having 
n letters, none of the codes consist of more than n — 1 digits. 





VARIABLE-LENGTH BINARY ENCODINGS 951 


Each of the last two lemmas associates a number with each exhaustive 
encoding, and they can be proved by induction on the number of letters 
in the alphabet. The number associated with each exhaustive encoding 
is represented in terms of the number associated with each of the two 
encodings that are constructed as described in Lemma 4 for the subalpha- 
bet having the prefix 0 and the subalphabet having the prefix 1. 

Theorem 9: The cost of the Huffman encoding of an alphabet is a con- 
tinuous function of the probabilities of the letters. 

Theorem 10: The cost of the best alphabetical encoding of an alphabet is 
a continuous function of the probabilities of the letters. 

The last two theorems will be proved together, enclosing in parentheses 
the changes which convert the proof of Theorem 9 into a proof for Theo- 
rem 10. In fact, what will be proved are the slightly stronger theorems: 
For two alphabets A and A* having the same n letters, if p; is the prob- 
ability of the ith letter of A, p;* is the probability of the 7th letter of A*, 
and if k and k* are the costs of the Huffman encoding (best alphabetical 
encoding) for A and A*, then 


;—k*¥is om P (11) 


If we let B be the right member of inequality (11) and let k’ be the 
cost of using the Huffman (best alphabetical) encoding of A* as an en- 
coding for A, then, by Lemma 6 and the definition of cost, we can con- 
clude that | k’ — k*| < B and, since from the definition of k we can 
conclude that k < k’, we can combine these to obtain k* — k < B. 
By a similar argument involving the use of k’’, the cost of using the Huff- 
man (best alphabetical) encoding of A as an encoding for A*, we obtain 
k — k* < B. Combining these, we obtain (11). 

Theorem 11: The Huffman encoding for a given alphabet has a cost which 
is less than or equal to that of any uniquely decipherable encoding for that 
alphabet. 

Proof: This proof is essentially that of MeMillan.’ Let us consider any 
uniquely decipherable encoding C. We will construct a new encoding C’ 
which has the same cost as C, and which has the prefix property. How- 
ever, by its method of contruction, the Huffman encoding has a cost 
which is less than or equal to that of any encoding having the prefix 
property, completing the proof of the theorem. Let NV; be the number of 
digits in the code which C associates with the 7th letter of the alphabet. 
Let the letters of the alphabet be renumbered in such a way that V; S 
Nii. Then, as in the encoding theorem (Theorem 1 of this paper, or 
Theorem 9 of Shannon,’ we let 
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A; = x 7, 
j=l 

and we define C’ to be the encoding which associates with the ith letter 
the code C,’ obtained by truncating A; after N; digits. Then it follows that 
the digits truncated were 0’s, and hence that each C;’ agrees numerically 
with the corresponding A;. By (10), each of the A; is less than 1. To 
show that C’ has the prefix property, we assume that C,’ is a prefix of 
C;’. Then i < j, by the renumbering. However, Ai,, = A; + 2-”*, and 
hence A; 2 A; + 2 ‘i Thus, A; cannot agree with the first NV; places 
of A;. Hence, the first N; digits of C;’ are different from those of C;’. 

Theorem 12: If A, is the number of exhaustive binary alphabetical en- 


> 


codings for an alphabet having n letters, Ay = Az = 1, and forn 2 3 we 


have 


(2n — 3)!2 


ai. gomecemeaninete. . (12) 
(n — 2)! n! 


Asx 

Theorem 13: If T,, is the total number of exhaustive binary encodings for 

an alphabet having n letters, T; = 1, Tz, = 2 and, for n = 3, we have 
(2n — 3)!2 


T 
an. “os 


Bs Sl (13) 
(n — 2)! 

These theorems show how rapidly A, and 7’, increase with increasing 
n. Since, by Theorem 3, A, would be the number of encodings to con- 
sider if it were desired to find the best alphabetical encoding by enumera- 
tion, Theorem 12 shows that the methods already given in this paper 
(even the general alphabetizing algorithm) are much faster than exhaus- 
tive enumeration. Similarly, Theorem 7 and Theorem 8 show how much 
slower exhaustive enumeration is than the algorithm given by Huffman.! 

Each of the A, alphabetical encodings may be converted into n! of 
the 7, encodings by permuting its codes in all possible ways. It follows 
that T,, = n!A, , and it suffices to prove Theorem 12. Consider for n = 2 
an exhaustive alphabetical encoding of n letters. Some number k = 
1, ...,n — 1 of these letters has a code with prefix 0. These k codes, 
each with its leading digit 0 removed, have been shown (Lemma 4) to 
form one of the A; exhaustive alphabetical encodings of k letters. Simi- 
larly, the remaining n — k codes, minus their leading digits 1, form one of 


n= 


the A,_, exhaustive alphabetical encodings of n — k letters. Thus, if 
9 


n—1 


An = >, ArAne, (14) 


k=1 
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while A; = 1. To solve (14), construct the generating function a(x) = 
Aye + Aor? + Azz? + .... By (14), a(x) = « + a(x); ie., 


a(x) = 4(1 — V/1 — 42). (15) 


The negative sign of the square root is needed to make a(0) = 0. The 
series for a(x) is obtained using the binomial theorem with power 3. 
The coefficient of «” (which is A,) has the expression (12). 


VII. ENCODINGS WITHOUT THE PREFIX PROPERTY 


So far in this paper very little has been said about encodings without 
the prefix property. For instance, we restricted the best alphabetical 
encoding to be the encoding having the lowest cost among all alphabetical 
order-preserving encodings having the prefix property. However, in view 
of the fact that the special encoding given in Table I is an alphabetical 
encoding and has cost 4.1801, it appears to be advantageous to dispense 
with the prefix property requirement. However, not very much is known 
about the properties of encodings lacking the prefix property, and, in 
fact, it is not known whether the special encoding given in Table I can 
be further improved or not. In fact, it was not constructed on the basis 
of any general procedure, but was found by a heuristic method. The next 
few paragraphs will give a few results which we have found about en- 
codings without the prefix property, but will also give some examples of 
the difficulties which it is possible to get into when using such encodings. 

It should be noted that a message which begins with the letter Y in 
the special encoding cannot be deciphered as soon as the Y has been 
received, but it is necessary to wait for further received digits in order to 
distinguish it from a Z. In particular, in the case of the message enci- 
phered as 11111101111110 it is necessary to wait for the 14th received 
binary digit before the first letter can be deciphered. 

In general, we will say that the delay of a presumed message is d if it 
is necessary to wait for the receipt of the first d binary digits before the 
first transmitted letter can be recognized. We will say that the delay of 
an encoding is d if d is the least upper bound of the delays of all pre- 
sumed messages of that encoding. We will say that an encoding has the 
finite delay property if the delay of that encoding is finite. For instance, 
the special encoding of Table I has the finite delay property, and in fact 
has delay 14. 

Theorem 14: If an encoding C has infinite delay, then there exists a pre- 
sumed message of C which has infinite delay. 

Proof: Given an encoding C with infinite delay, there exists an infinite 
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sequence of presumed messages M,, M,, M;, ... such that /; has 
delay at least 7. Then either the set of those presumed messages M ; whose 
first binary digit is 0 or the set whose first binary digit is 1 is an infinite 
set. We thus can choose an infinite subsequence of presumed messages 
M,, Mz, M;,... such that M; has delay at least 7 and such that all of 
the messages agree on the first binary digit. Proceeding by induction, 
we can choose at the kth step a subsequence of presumed messages which 
all agree on the first k digits. Then the infinite presumed message whose 
kth binary digit is the Ath binary digit of all presumed messages re- 
maining after the kth inductive step is a presumed message, and has 
infinite delay. 

lor an encoding to be useful in practice, it seems likely that it must 
have the finite delay property. This would permit a deciphering machine 
to be built having only a finite amount of memory, and it would permit 
two-way communication (as in telephony) to be almost instantaneous. 
However, in delayed communication systems (common in telegraphy) 
for which a tape is used for storing messages, this tape might be used to 
provide the unbounded amounts of memory needed to decipher an infi- 
nite delay encoding. 

To investigate further the problems of designing an optimal-cost en- 
coding of any sort (such as an alphabetical-order encoding), without 
requiring it to have the prefix property, it should be remarked that the 
problem is finite, but not necessarily easy to attack. That is, given an 
alphabet in which all of the letters have positive probability, and given a 
constant K, there are only a finite number of encodings of this alphabet 
which have a cost less than K. For if m is the smallest of the probabilities, 
there are not more than K/m digits in the longest code of any such en- 
coding, and there are only a finite number of encodings of an n-letter 
alphabet in which each code has length less than K/m. However, this 
number would be astronomically large for any alphabet of reasonable 
size. 

One particular way of generating encodings which will be used in a 
few examples below is of some general interest. The reversal of an en- 
coding C is a new encoding (which will be called C* for the remainder 
of this paper) which is obtained by letting the code for each letter be 
written in the reverse order. This interchanges the direction of increas- 
ing time, and changes many of the properties of the encoding, but it 
does preserve unique decipherability. 

Table V demonstrates many of the properties and complications of 
encodings, contrasting the one having the prefix property with three 
other encodings lacking this property. Each of the four encodings shown 
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TABLE V 


Letter Probability First Code Second Code Third Code | Fourth Code 
A 0.330 000 00 OO 00 
B 0.005 001 001 OO11 OO1L1 
C 0.330 01 10 Ol Ol 
D 0.005 10 101 Ol11 O1111 
E 0.330 11 11 10 10 
Cost 2.335 2.01 2.02 2.03 


preserves alphabetical order, and each is uniquely decipherable. The 
first encoding has the prefix property, and in fact is the best alphabetical 
encoding in the sense used in this paper. However, it has an appreciably 
higher cost than either of the other three encodings, none of which has 
the prefix property. The reversals of each of the last three encodings have 
the prefix property, but the reversal of the first encoding does not. 

The second encoding of Table V has the lowest possible cost of any 
uniquely decipherable binary encoding by Theorem 11, since it is the 
reversal of a Huffman encoding. However, the second encoding has 
infinite delay, since the presumed message OOL1L11 . . . has infinite delay. 
Furthermore, the second encoding, although it preserves the alphabetical 
order of individual letters, does not preserve the alphabetical order of 
words made up out of these letters. For instance, the enciphered form 
of CE is a larger binary number than the enciphered form of DA, al- 
though the latter occurs later in alphabetical order. The property of 
preserving alphabetical order of all words will be called the strong alpha- 
betical property, and it has already been shown that alphabetical en- 
codings having the prefix property have the strong alphabetical prop- 
erty. However, both the alphabetical encoding and the special encoding 
of Table I have the strong alphabetical property, and all of the en- 
codings of Table V except the second encoding have the strong alpha- 
betical property. There would be very little to be gained by employing 
an alphabetical order encoding for sorting or dictionary purposes unless 
it had the strong alphabetical property. 

The third encoding lacks these defects of the second encoding, but it 
has a special one of its own, about which more will be said in the next 
section. This defect has to do with synchronizing, and it can be explained 
in this case by the observation that every code of the third encoding 
has an even number of binary digits. Thus, if the deciphering circuit 
starts up while it is out of phase, it can never get back in phase. The two 
phases correspond to the odd-numbered and the even-numbered binary 
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digits, and the deciphering machine, if it is out of phase, would never 
get back in. In this case, where there are certain codes which cannot 
occur, the defect could be remedied by designing the circuit to addition- 
ally change phase if it ever receives a code 1011 or 1111, but this adds 
an extra complication to the circuit. However, the first and second 
encodings have the property that each of them will automatically get 
back in synchronism with probability 1, without the addition of any 
other codes or any other special features to the circuit. 

The fourth encoding has none of these defects, and since its cost is so 
near to the least possible, it would undoubtedly be a reasonably good 
choice as a solution, if this particular alphabet had arisen in an actual 
practical problem. 

So far in this paper, each example of an encoding with the finite delay 
property has had a delay equal to Nix , where Nimax is the number of 
digits of the longest code of the encoding. This result does not hold in 
general, as is illustrated by Table VI. The fifth encoding has Ninax = 6, 
but it has delay 8. 

TaBLe VI 


Letter Fifth Code Sixth Code 
WwW 00 00 
xX 001 01 
Y 101 10 
Z 


110101 11 

The encodings having the finite delay property but not the prefix 
property, such as the special encoding of Table I and the fifth encoding 
of Table VI, provide counterexamples which contradict Remark II of 
Schiitzenberger (Ref. 5, page 55) and provide the example which is 
asked for in the sentence following Remark I of the same paper. 

As an alternative to the above method of expressing quantitatively 
the finite delay property, we may make the following definitions for use 
later in this paper. We will say that the excess delay of a presumed mes- 
sage is e if it is necessary to wait for the receipt of e binary digits beyond 
the end of the first transmitted letter of the presumed message before 
this first letter can be recognized. We will say that the excess delay of an 
encoding is e if e is the least upper bound of the delays of all presumed 
messages of the encoding. 

If d is the delay of an encoding, e is its excess delay, and Nimin and Nmax 
are, respectively, the minimum and maximum numbers of digits of any 
codes of the encoding, then we obviously have e + Nmin S d S € + 


Nmax . Then an encoding has the finite delay property if and only if the 
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excess delay of that encoding is finite. Also, an encoding has the prefix 
property if and only if the excess delay of that encoding is 0. 


VILL. SELF-SYNCHRONIZING PROPERTIES 


Problems of how to make a transmitting device and a receiving device 
become and remain synchronized with each other are important in the 
engineering design of many kinds of systems. Since the encodings dis- 
cussed in this paper are variable-length, it might seem that the syn- 
chronizing problem for enciphering and deciphering circuits would be 
especially difficult. However, the synchronizing problem is very simple 
for many variable-length binary encodings, because of a particularly 
favorable property which they possess. These remarks can best be il- 
lustrated by an example. Suppose that (using the alphabetical encoding 
of Table I as an example) a message beginning 1110011110100111000. . . 
is received, and we wish to observe how a deciphering circuit would 
decipher it. Since the encoding has the prefix property, the deciphering 
circuit should first find a code which is a prefix of this message, and then 
decode this to obtain the first letter T of this message. Proceeding with 


TABLE VII 


: es : H : A ; T Ps 
:.} 5, @ @ 3. 2 £.426 1. oe @ 4) 3. 29 Be. x= 
~ R ‘ 7. : M ‘ I : 


the remaining part, it then finds the letter H, and then the rest of the 
deciphered version shown in the first line of Table VII, where the sym- 
bol “:” is used to mark the divisions between those sequences of binary 
digits which were deciphered as individual letters. 

Next suppose that the same sequence of digits had been received, but 
that the deciphering circuit was not in synchronism with the enciphering 
circuit. In particular, suppose that, when the deciphering circuit was 
first turned on, it was in the state that it would be in if it were partly 
through the operation of deciphering some letter, and that the initial | 
of the message was interpreted as the last digit of this letter. This de- 
ciphering is indicated on the third line of Table VII. Once again, the 
symbol “‘:” has been used to mark the divisions between letters. Then 
these two decipherings are out of phase (i.e., out of synchronism) with 
one another at the beginning of the message, but at the end of the re- 
ceived message they are in phase with each other, as is indicated by the 
fact that the “:” symbols align with each other at the right end of Table 
VII. This means that the deciphering circuit would have automatically 
become synchronized, without any special synchronizing circuits or 
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synchronizing pulses being necessary. It was, of course, necessary for at 
least two of the codes of the encoding to end in the same sequence of 
digits, but this is very likely to happen for any variable-length encoding, 
unless special efforts are made to prevent it. 

However, if we had been using a fixed-length encoding, such as the 
sixth encoding of Table VI, in which all of the codes have a fixed length 
k, there would be exactly k different phases in which the deciphering 
circuit might find itself, and the circuit could never make a transition 
between them. No pair of different codes can end in exactly the same 
sequence of digits, and so no two of these phases can become synchro- 
nized. Each of these phases will have all of the codes ending after 7 
digit times, and after k + j, 2k + j, ete., where j is the remainder ob- 
tained on dividing the position of the symbol ‘“:’’ by k, and hence j 
can take on k different possible values. 

Also, even in the case of variable-length encodings, if all of the code 
lengths are divisible by some integer k, then there will be at least k 
different phases. For if the position of one occurrence of the symbol ‘':”’ 
has remainder 7 when divided by k, the position of all other occurrences 
of the symbol ‘:”’ in this phase of decipherment will have the same re- 
mainder. 

The above remarks apply strictly to exhaustive encodings, but may 
not apply where there are certain sequences of digits which can never 
occur. For if such a sequence of digits does occur, this may be used by 
the circuit as a special indication that it is out of phase, and hence it 
may be possible to build auxiliary circuits which can cause resynchroni- 
zation, even when a fixed-length encoding is used. So a more complete 
treatment of synchronization would allow such auxiliary circuits, but 
here we will consider only self-synchronization, which is carried out 
inherently by the same means as is used for deciphering. 

To speak more precisely about the self-synerhonizing properties, we 
will make some definitions. Given any encoding C and any 

finite sequences x and y such that x is not the 
enciphered form (with respect to encoding C) (16) 
of any message, and xy is a presumed message, 


if z is a finite sequence of binary digits such that both xyz and yz are 


complete enciphered messages, we will say that z is a synchronizing se- 
quence for x and y. As an example, we have seen in Table VII that 
011110100111000 is a synchronizing sequence for 1 and 110. 

Given any uniquely decipherable encoding C which has some codes 
of length more than 1, exactly one of the three statements given below 
will hold: 
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i. For all (16), there is no z such that z is a synchronizing sequence 
for x and y. The encoding C will then be said to be never-self-synchroniz- 
ing. 

li. For each (16), there is a z which is a synchronizing sequence for zx 
and y. The encoding C will then be said to be completely self-synchronizing. 

ili. For some (16), there is a synchronizing sequence for x and y, 
but for other (16), there is no synchronizing sequence for x and y. The 
encoding C will then be said to be partially self-synchronizing. 


Furthermore, we will define a sequence z to be a universal synchronizing 
sequence for the encoding C if, for all (16), this same sequence z is a 
synchronizing sequence for x and y. 


Theorem 15: Given an exhaustive encoding C, then C is completely self- 
synchronizing if and only if there exists a z which is a universal synchroniz- 
ing sequence for C. 

Proof: A universal synchronizing sequence clearly satisfies the condi- 
tions of the definition of completely self-synchronizing, so it remains 
only to construct a universal synchronizing sequence, given that there 
is a synchronizing sequence for each finite sequences x and y. By the 
exhaustive property, there is a code consisting entirely of 0’s. We will 
assume that there are k 0’s in this code. We will construct our z by 
starting with Niax 0’s, where Nax is the length of the longest code of 
C’; after this, there are only k different phases in which the circuit could 
be. Then we find a synchronizing sequence for two of these phases (for 
instance, a synchronizing sequence for 00 and 0), and put this next after 
our sequence. Next we put on the sequence of Ninax 0’s again. There are 
now at most k — 1 phases to synchronize, and, adding on sequences for 
these one at a time, we eventually construct our desired universal 
synchronizing sequence. 

The alphabetical encoding of Table I can be shown by Theorem 15 
to be completely self-synchronizing, since the sequence 010001011 is a 
universal synchronizing sequence for this encoding. The message AD 
has this sequence as its enciphered form. In addition, there are many 
other short universal synchronizing sequences for this encoding, such 
as the enciphered forms of #Y, AY, BD, BY, EY, HI, ID, JO, JU, 
MW, NY, OW, PO, PU, TY, ete. Since just these digraphs listed here 


occur as about three per cent of all digraphs in connected English text,’ 


it can be seen that, if English text were transmitted by use of this 
encoding, it would be quite likely to synchronize itself very quickly. 
In fact, it is easy to see that any exhaustive encoding which is com- 
pletely self-synchronizing will synchronize itself with probability 1 if 
the messages sent have the successive letters independently chosen with 
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any given set of probabilities, assuming only that all of these probabilities 
are positive numbers. This will occur since the probability of a universal 
synchronizing sequence occurring at any given time is positive, and, if 
we wait long enough, this will have happened with probability 1. 

The fact that this occurs with probability 1 does not make it quite 
certain to occur, and, in fact, it is possible to choose arbitrarily long 
sequences of English words which do not contain a universal synchroniz- 
ing sequence. An example of such a sequence for the alphabetical encod- 
ing of Table I is 


CHECK *# SYNCHRONISM # OF # LONG # FILTHY 
* CHUCKLE #* HEH #* HEH * HEH * HEH ---. 


But such a sequence is extremely unlikely to continue indefinitely in any 
practical communication system or record-keeping system. Also, slight 
complications of the encoding could permit certain sequences which are 
certain to occur in English text (such as a period followed by a space 
symbol) to be universal synchronizing sequences. 

One quality which might be worth comparing for various proposed 
encodings under consideration for possible use might be the average 
speed with which they synchronize themselves, when carrying typical 
traffic. This speed could be calculated from a sufficiently good knowledge 
of the statistics of the traffic, but it could more easily be measured 
experimentally, either by the use of actual enciphering and deciphering 
circuits, or by simulating their behavior on a digital computer. 

The synchronization problem occurs not only when the equipment 
is first turned on, but also in transmission systems for which there is a 
noisy channel. For if some digits of a message encoded in a variable- 
length encoding are changed, the change may cause the circuits to get 
out of synchronism by the change of a short code into the prefix of a 
long one, or vice versa. Also, of course, temporary malfunctions of the 
enciphering or deciphering circuit themselves might cause them to get 
out of phase. 

It may be of interest to enumerate the known results about combina- 
tions of synchronizing properties and lengths of the codes of exhaustive 
encodings. 

If an exhaustive encoding has a fixed length (all codes having length 
the same integer /), then it must be 


never-self-synchronizing. (17) 


If an exhaustive encoding has all the lengths of its codes divisible by 
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some integer k > 1, but these lengths are not all equal to k, then it 
must be one of the following: 

never-self-synchronizing, (18) 
partially self-synchronizing. (19) 


If an exhaustive encoding has the greatest common divisor of the 
lengths of its codes equal to 1, then it must be one of the following: 


completely self-synchronizing, (20) 
partially self-synchronizing, (21) 
never-self-synchronizing. (22) 


Of the above six cases, (17), (19) and (20) occur very much more 
commonly than the others. In fact, it is very difficult to construct 
examples of the other three, unless you deliberately set out to do so. 
The following theorems will give indications of the fact that cases (18) 
and (22) are hard to obtain. 

Theorem 16: Given an exhaustive encoding which is never-self-synchroniz- 


ing, if we let 


Q=>) N2™, 


then Q will always be an integer. 

It can be seen that, in the case of a fixed-length code, Q will be the 
length. However, no one of the exhaustive encodings (except those 
having fixed length) listed so far in this paper has an integer value for 
(). Rather than give the full details of a rigorous proof of Theorem 16, 
only the main ideas involved will be explained. The sum Q is the average 
length of the codes obtained by deciphering a presumed message, if the 
presumed message was obtained by choosing 0’s and I’s as successive 
digits by independent choices having probability one-half. If we put 
such a random presumed message into the deciphering circuit, we have 
several different phases in which it may be deciphered. By the never- 
self-synchronizing property no two of these phases can ever come to- 
gether. 

Let. H be the set of all prefixes of the presumed message. Then two 
of these prefixes will be said to be of the same phase if they are of the 
form 6 and 6@, where # is the enciphered form of a complete message. 
The set H is subdivided by the equivalence relation “‘being of the same 
phase” into B distinct sets, where B is the number of phases. By sym- 
metry, the probability that any two given members of H will be of the 
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same phase is equal, and, since each phase occurs with equal probability 
and the sum of all of them is 1, each phase occurs with probability 1/B, 
where B is the number of phases. However, Q was the expected difference 
in length between a given member of H and its next longer member; 
hence, we will have Q = B. 

Theorem 17: Given an exhaustive encoding C, C is never-self-synchroniz- 
ing if and only if its reversal C* has the prefix property. 

Suppose that C is not never-self-synchronizing. By the definition of 
synchronizing sequence, there exist finite sequences x, y and z such that 
x is not the enciphered form of a message, but yz is the enciphered form 
of message m, and xyz is the enciphered form of message me . 

lor some values of n the last n letters of m,; may agree with the last 
n letters of m,.. But, by the fact that x is not the enciphered form of a 
message, there is a largest value of n for which this is true. Let this 
largest value be n’, and let the letters which are n’ + 1 from the end of 
m, and ms, respectively, be called L; and L.. Then C(L;) and C(Le) are 
both suffixes of the same message (the previous part of xyz), and hence 
the reversed form of one of them is a prefix of the reversed form of the 
other. 

The converse follows more readily, since, if @ and 6@ are both codes 
of C*, then the reversed form of @ is a synchronizing sequence for the 
reversed form of @ and the null sequence. 

To return to the problem of which of cases (17) through (22) can occur, 
it can easily be shown by the use of Theorems 16 and 17 that, among 
all exhaustive encodings in which not all codes are of the same length, 
the only ones which are never-self-synchronizing and have fewer than 
16 letters in their alphabet are the encoding which encodes a nine-letter 
alphabet by using the list of codes (000, 0010, 0011, 01, 100, 1010, 1011, 
110, 111), and the reversal of this encoding. This encoding is due to 
Schiitzenberger.® 

This provides an example showing that case (22) can occur. That 
(21) can occur is shown by an encoding (derived from the above by 
composition) using the list of codes (000000, 0000010, 0000011, 00001, 
000100, 0001010, 0001011, 000110, 000111, 0010, OO11, O1, 100, 1010, 
1011, 110, 111). 

It is also possible to construct an example of case (18), but the one 
we have found is too complicated to be worth presenting here. 


IX. ONE REALIZATION FOR ENCIPHERING AND DECIPHERING CIRCUITS 


Some reluctance to use variable-length encodings has been based on 
the opinion’ that it is hard to build circuits to encipher or decipher 
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Fig. 1 Block diagram of enciphering circuit 


them. Descriptions will be given below for one circuit for doing each 
of these, using principally just a shift register and a combinational 
translating circuit. Since using any code requires having a combinational 
translating circuit, and since presumably most devices using coded 
alphabetical information are likely to cause it to pass through a shift 
register, the kind of circuit described below would add very little com- 
plexity to such machines, and would automatically give them the self- 
synchronizing property, in the case of most variable-length binary 
encodings. 

The enciphering circuit, shown in Fig. 1, contains a shift register 
containing the words “HAS JUST BEEN ENCIPHERED” followed 
by a binary digit 1 and a string of zeros as long as the longest code which 
“an occur in the variable-length encoding. We will assume that it is in 
such a state as to have the zeros as shown, although it can easily be seen 
that it will get into this state if it starts in any other condition. 

The circuit of Fig. 1 also contains an input reader (which can for 
concreteness be thought of as a punched paper tape reader, although it 
could be a buffer or other input device), which can read in one letter 
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at a time whenever it is given a pulse on the lead labelled “to advance 
input”’. 

The recognition circuit, which consists of a multiple-input OR circuit 
followed by a negation circuit, gives an output whenever there are as 
many binary zeros present as there are in the illustration. This sends a 
signal to enable the gate, letting the code corresponding to the next 
letter be read into the locations previously occupied by the 1 and all of 
the zeros. However, the translating circuit, which translates the letters 
into this encoding, instead of being designed to give directly the original 
variable-length encoding, gives an encoding which differs from it by 
having an extra “1” added to the end of each code. The output of the 
recognition circuit also goes to advance the input, reading in the next 
letter to be converted, after passing through a delay sufficient to be 
sure that the gate is now no longer enabled. This delay prevents the 
letter being translated from changing while it is being gated into the 
output shift register. 

As soon as the new code has been read into the shift register, it begins 
to be shifted along to the left in Fig. 1. The 1 at the end of the code 
serves to mark the end of the code during this shifting, but it will be 
eliminated from the enciphered form of the message. The shift register 
is connected so that, when it is shifted, a 0 appears at the right end. 
As soon as the 1 passes beyond the end of the recognition circuit, there 
will be only zeros present, and hence the recognition circuit will again 
recognize the end of a letter and repeat the cycle as given above. 

Instead of having a counter or a special sequential circuit to keep 
track of where the current letter ends, this has been done here by add- 
ing a single binary digit to the code and adding one to the length 
of the required shift register. 

Similarly, an analogous scheme can be used to decipher from a variable 
length code into any other representation for letters, by using one special 
position in the shift register, as shown in Fig. 2. This deciphering circuit 
can be built only for encodings having the finite delay property, although 
the enciphering circuit of Fig. 1 can be used for any binary encoding. 

The shift register into which the digits to be deciphered are shifted 
is divided into two halves, which will be called the left half and the 
right half. The right half has e digit positions, where e is the excess delay 
of the encoding. The left half has Nmax + 1 digit positions, with the 
extra 1 being used to mark the end of those digits which already have 
been deciphered. 

At the beginning of the cycle we will assume that the left half of the 


shift register has just been cleared to the state shown in Fig. 2, that is, 
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to contain Ninax 0’s followed by a 1. Next, the digits of the message to 
be enciphered shift toward the left. Since the 1 precedes them, it marks 
clearly how many of these digits have been shifted into the left half. 
As soon as all of the digits of the code of the first letter of the message 
have been shifted into the left half, the translating circuit will then 
give its outputs. It gives the translated codes for the letter, as well as 
giving another output, w, which equals 1 only when the complete first 
letter is present. The translating circuit makes use of the inputs from 
only the left half of the shift register, ignoring the digits in the right 
half, unless the code C present in the left half is a code which is also a 
prefix of another code. It makes use only of those digits from the right 
half which are necessary to distinguish between this code and the par- 
tially shifted-in code of which it is a prefix. It gives the output w = 1 
whenever the entire code for the first letter of the enciphered message 
has been shifted over into the left half and, whenever only a prefix of 
the code of the first letter is there, the output w will equal zero. 

This output w will then cause the left half to clear back to its original 
state, and, after a delay sufficient to allow the output to be received, it 
gives the ‘“‘to advance output” signal to the output punch or buffer. 
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Fig. 2 Block diagram of deciphering circuit. 





966 THE BELL SYSTEM TECHNICAL JOURNAL, JULY 1959 


The deciphering circuit then repeats the above cycle for the next letter 
of the message. 

The translating circuit of this deciphering circuit must give the ap- 
propriate outputs whenever the complete code for the first letter is 
present in the left half of the shift register, and must give w = 1 in these 
cases. It must also be designed to give the output w = 0 whenever an 
incomplete prefix of the first letter is present, but, since in general there 
may be many states of the shift register which do not correspond to 
either a letter or a prefix, there may be many “don’t cares’’ occurring 
in the design of this translating circuit, which will permit it to be simpler 
than a completely specified function having this many inputs. 

The time delay between the receipt of the beginning of an N-digit 
code for a letter and the actual sending of this letter to the output punch 
or buffer will be N + e, which may sometimes be slightly longer than 
the delay d of the message. However, the circuit for doing the deciphering 
in the minimum time would be more complicated, in that it would not 
always clear the shift register to the same state, so it is not presented 
here. 

However, in the enciphering circuit given in Fig. 1 there is only a 
delay of one digit time, while the message is shifted through the one 
extra stage at the left end of the shift register. Hence, neither of these 
two circuits operates in quite the minimum possible time, since speed 
has been sacrificed for simplicity of construction. 


X. FURTHER PROBLEMS 


There are many further problems suggested by the ideas discussed 
in this paper, and which we have not been able to solve. Are there any 
binary encodings which satisfy (9) other than the exhaustive encodings 
and their reversals? Are there any encodings C which satisfy (9) and 
such that both C and its reversal C* have the finite delay property 
without both C and C* having the prefix property? Given an encoding 
which is uniquely decipherable but which does not possess the finite 
delay property, does the set of presumed messages having infinite delay 
always form a finite set? Does it always form a set of measure zero? Is 
there a simple polynomial in Na. and n which will be an upper bound 
to the delay of any encoding having the finite delay property? Are the 
encodings for which the algorithm of Sardinas and Patterson? fails to 
terminate precisely the same as the encodings having infinite delay? 
Given any encoding having infinite delay, is there a Turing machine 


(perhaps having several tapes and several reading and writing heads on 





VARIABLE-LENGTH BINARY ENCODINGS 967 


ach) which can decipher any K-digit message in a length of time which 
is less than a constant times K? 
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Recurrent Codes: Easily Mechanized, 
Burst-Correcting, Binary Codes 


By D. W. HAGELBARGER 


(Manuscript received April 14, 1959) 


A class of codes capable of correcting multiple errors is described. Some 
of these codes can be implemented with considerably less hardware than was 
needed for previous multiple error-correcting codes. A general method is 
shown for constructing a code of redundancy 1/b that will correct error 
bursts of Kb or fewer digits (K and b integers). The logical design of the en- 
coder and decoder, as well as the guard space requirement of good digits be- 
tween bursts of errors, is described. 


I. INTRODUCTION 


In adapting the existing telephone network to high-speed digital data 
transmission, an error control problem arises. Most of the circuits were 
designed primarily for voice-type signals and considerable attention 
was given to control of thermal noise. Because of the high redundancy 
of speech, impulse noise on the lines is usually not even noticed by the 
telephone users, and hence has not been a serious problem. On the other 
hand, high-speed digital data (especially numerical data) contain little 
redundancy, and the noise pulses may resemble the signal pulses and 
thus cause errors. 

The deliberate introduction of redundancy to detect and correct 
transmission errors has been used for some time. Early systems' used 
repetition of characters and duplication of channels. There were two 
schemes which sent pictures of the characters using raster scans. By the 
late 1930’s a radio telegraph system? using a 3-out-of-7 code for error 
detection had been patented and telephone apparatus using 2-out-of-5 
codes’ was being designed. 

Most, if not all, of the recent work on error-detecting or error-correct- 
ing codes stems from Hamming’s Systematic Parity Check codes.‘ These 
codes will correct a single error per block of digits. Since then, much work 
has been done on codes for multiple errors (see Refs. 5 through 16). The 
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assumption has usually been made that the errors are statistically in- 
dependent. On many communication channels, however, the errors are 
not independent but tend to come in groups. For example, a lightning 
stroke may knock out several adjacent telegraph pulses. These groups 
of errors are called ‘“‘bursts”. Codes for detecting and correcting bursts 
have been proposed by Abramson," Gilbert,!® Hamming" and Meyer." 
The class of codes described here differs in that the block structure has 
been minimized; the resulting symmetry allows very simple mech- 
anization and also simplifies the synchronization problem. Since the 
block size is small, the codes also fit very naturally into systems where 
the data must be accepted and delivered continuously, rather than in 
batches. 


Il. EXAMPLE 


Before describing the general recurrent code we will give a particularly 
simple example. Assume that we wish to correct bursts of length six or 
less. The simple code has every other digit a check digit, giving a re- 
dundancy of one-half. The encoder is illustrated in Fig. 1. It consists of 
a shift register of length seven. The data digits enter from the left 
(Position 1) and are shifted through the register before being transmitted. 
For each shift we generate a check digit so that the parity (number of 1’s) 
of the check digit and the data digits in the first and fourth positions of 
the shift register is even (zero or two). This check digit is transmitted 
before the data digit in the seventh position. Then a shift is made; the 
data digit which was in Position 7 is transmitted, and a new check digit 
is calculated. This process is illustrated in Fig. 2; the successive lines are 
one shift time apart. During this time interval one new data digit is 
accepted by the encoder and two digits, one data, one check, are trans- 
mitted. 

The decoder is shown in Fig. 3. The received code enters the switch 
where the alternating data and check digits are separated, the check 
digits going to the lower shift register and the data digits to the upper 
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Fig. 2— Timing chart of digits moving through the encoder. 


one. There are two copies of the parity circuits, R and S, each one check- 
ing the parity relation imposed by the encoder. The decoding rule is: 


Whenever both FR and S fail, change the data digit in Position 4 
(0 — 1, 1 — 0) while shifting it to Position 5. If only one parity 
check fails, make no change. 


The corrected data digits are available at Position 5 of the data shift 
register. In a burst of length six or less, there can be at most three data 
digits and three check digits wrong. The parity relation used in encoding 
involves digits which are spread far enough apart so that no burst of six 
or less will affect more than a single digit in any one parity group. 
After any burst of length six or less, a 20-digit errorless message is 
enough to completely refill the decoder shift registers. Hence, the next 
burst can be corrected without interference from a previous burst, if 


there is a ‘‘clean’”’ message 19 or more digits long between bursts. 
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Fig. 3 Decoder 
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TABLE I 





| | 
Data Shift Register | Check Shift Register | Guard Space Between 


Burst Length Length Length Bursts 


4 7 13 
6 10 19 
S ¢ 13 25 
10 16 31 


If the message is to be retransmitted the check digits can be corrected 
at Position 10 of the check digit shift register (Fig. 3). The rule is: 


Whenever parity check R holds and parity check S fails, change 
the digit in Position 10 of the check digit register. 


A data digit error cannot cause parity R to hold and parity S to fail, 
because this would require an error in Position 7 of the data register, 
and, since all data digit errors are corrected in going from Position 4 to 
Position 5, this cannot happen. 

This particular coding scheme fails first for a burst of length seven, 
consisting of a check digit and another check digit six digits later. When 
just these two check digits are wrong, the decoder assumes that a data 
digit is wrong and changes it. 

A demonstration device using this code has been built. It consists of a 
punched tape reader, an encoder, a transmission line, a decoder and a 
tape printer. The circuitry uses relays and can be operated fast, slow or 
one step at a time. Digits in the encoder, transmission line and decoder 
are displayed on lamps. The transmission line has switches for inserting 
errors in the encoded message. Other lamps are used to indicate parity 
check failures. An auxiliary circuit can be used for detecting overlong 
bursts, flashing a yellow lamp whenever the decoder has a burst and 
locking up a red lamp whenever a detectable overlong burst occurs. 

There is an extension of this code to correct bursts of any even length. 
We merely spread out the parity check so that the burst can only effect 
one term in any parity group. To correct bursts of length 2K or less 
requires an encoding shift register of length 2K + 1. The decoding data 
digit register has the same length, 2K + 1, and the check digit register 
must be 3K + 1. The parity checks involve digits K apart in the registers. 
A clean message of length 6K + 1 will always be sufficient separation 
between bursts. Table I shows typical values. 


Ill. GENERAL RECURRENT CODE 


The general (binary) recurrent code is constructed as follows: 
We are given a message to be transmitted consisting of data digits. 





RECURRENT CODES 


Fig. 4 — Portion of encoded message. 


We will add to this check digits to form the encoded message. The en- 
coded message is divided into blocks of length b. One position in the 
block is assigned to be a check digit? and the rest are data digits. Since 
the block must have at least one data digit, the shortest block length is 
b = 2. (This is the value used in the example of Section II.) The data 
digits are loaded into the data digit positions in the order received. The 
check digit is determined by a parity relation applied once for each 
block. This parity relation extends over a selected set of the digits in p 
consecutive blocks. (To be useful against bursts, p must be at least 2 and 
usually will be larger.) Fig. 4 shows a portion of the encoded message 
from the example of Section II. The data and check bits are indicated 
by D’s and C’s; the blocks are marked off with commas and the parity 
relation is shown by lines having *’s over the digits in a given parity 
group. Note that p for this example is 7; that is, any one of the parity 
groups extends over seven blocks. Every parity group has three digits 
in it, two data and one check; thus, each data digit is in two parity 
groups and each check digit is in only one parity group. 

We will denote which digits enter into the parity relation by b binary 


words of p digits each. We call these the parity words and label them 


P, , Ps, «++, Py . Consider p consecutive blocks. We form P, by observing 
the first position of each block; if the digit is in this parity group we 
write 1, otherwise 0. Then P: depends on the second positions of these 
p blocks, and so on. This is illustrated in Fig. 4. We have used the parity 
group indicated by the arrow and written the parity words so that the 
digits fall under the corresponding blocks. Thus P; has 1’s in the first 
and fourth blocks and P2 has a 1 only in the seventh block. (The num- 
bering of digits and blocks here and in Fig. 4 is from right to left. Fig. 
4 can be considered a “snapshot” of the encoded message on the trans- 
+ Codes with more than one check digit per block are possible, but it can be 
shown that, for a given efficiency and burst-correc ting ¢ vo ability, they always re 


quire more complicated encoding and dec — equipment than would the e quiva 
lent code with one check digit per block 
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mission line. This is different from the numbering method used below, 
where the digits are considered to be flowing past a fixed point, and are 
labeled with numbers which increase with time.) With this notation, 
there is a simple correspondence between the 1’s in the parity words 
and the connections to the shift registers of the encoder and decoder. 

Another way to describe these codes is to think of the digits leaving 
the encoder as being numbered serially; if X; is the ith digit, then the 
parity relation is given by a recurrent equation of the form: 


Xr4ib @ Xezro ® +--+ © Xuuts = constant, 


where the constant is 0 or 1, ® means sum modulo 2, k takes successive 
integral values, b is the block length, and r, s, ---, w denote which digits 
enter into the parity group. 

If we order the terms so that r < s < +++ < w, then 


p-2< 


The requirements that every position in the block must be represented 
at least once implies that each of the integers 0, 1, ---, (b — 1) must 
occur at least once as a remainder upon dividing r, s, ---, w by b. 

For the example of Section II we have: 


X14 ® Xs.0) ® X 1442h = 0. 


IV. BURSTS 


Consider an encoded message flowing through a communication chan- 
nel. A burst occurs and some of the digits of the message are changed. 
We will describe the burst pattern by a binary word having a | for each 
changed digit and a 0 for each correct digit. We require that the first 
and last digits of the word both be 1’s, since there is no point in includ- 
ing correct digits which are outside of the bursts. The length of the 
burst is the number of digits in the word. There are 2'~ different burst 
patterns of length J and 2'' different burst patterns of length J or less. 
The latter are the odd binary numbers having / or fewer digits. For 
example, the eight burst patterns of length 4 or less are: 1, 11, 101, 111, 
1001, 1011, 1101 and 1111. 

The effect of a burst on the encoded message depends on the phase of 
the burst pattern with respect to the block structure; hence, we will 


have to consider b2'"' different possible bursts of length / or less if the 


code has a block length b. We will indicate particular bursts either by 
showing a portion of the encoded message with the erroneous digits 
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marked with *’s or by listing the erroneous digits with superscripts to 
indicate which block a particular digit occupies. For example, 


* * * 


= 
--» DC, DC, DC, DC, -- 


is the same as D’C°C'D’. 


Vv. SYNDROMES 

At the decoder we will always have a circuit for checking the parity 
relation imposed by the encoder. This circuit gives an output once for 
each block received. If the parity check fails, the output is a 1; other- 
wise, it is a 0. In a practical transmission system, there are usually no 
errors and the check circuit has an output of all 0’s. When a burst of 


errors does occur, the check circuit will give a pattern of 1’s and 0’s. 
This pattern will be used to identify the burst, and hence we shall call 
it the syndrome. Since the syndromes occur immersed in a string of 0’s, 
only binary words having 1’s on both ends (odd numbers) can be used 
as syndromes and, as above, there are 2“"' possible different syndromes 
with & or fewer digits. 

Our first problem is to choose the parity relation in such a fashion 
that each burst of length / or less has a distinct syndrome.? A further 
problem is to choose the parity relation so that, given a distinet syn- 
drome, the correction of the burst which caused it is easily mechanized. 
That is, we want a systematic scheme for correcting a burst, given the 
corresponding syndrome, which is much better than having a table of 
all possible syndromes with the corresponding bursts. 

The procedure for calculating the syndrome corresponding to a given 
burst is as follows: 

* * * * * 
--+ DCD C,D3C,D.0C,D\C,DoC,D C,D C, 
Poy 1001000 
Pe 0 0 00 0 1 
Py’ 1 0 1 0 
Py 0 1 
Po 0 0 


Syndrome: es ee ee es oe | es a | 


+ If our code has a block length b which is a power of 2, then it is possible for 
b2'-1 = 2-1. A close-packed recurrent code is one which has all possible syndromes 
of k or fewer digits in a one-to-one correspondence with all possible bursts of 
length 1 or less. Since this is of more mathematical than practical interest, ex 
amples are deferred to Appendix I. 
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Number the blocks, starting with 0 for the first block containing an 
error, and continue the ihumbering far enough to inelude all blocks having 
errors in the burst under consideration. (The direction of numbering 


should be such that increasing numbers represent digits received at 


later times.) Write the parity word for the error in block number 0. 
Under this write the parity words for any other errors in this block. 
Now write the parity words for the other errors, shifting each one (in 
the direction of increasing time) the number of places equal to the block 
number in which the error occurs. The syndrome is the sum modulo 2 
of these parity words. In the sample above we use superscripts on the 
parity words to indicate in which block each error occurred. This shows 
the syndrome for the indicated burst of length 6, using the code of Sec- 
tion II. Note that there is never more than a single 1 in any column. 
By spreading out the parity words with 0’s sufficiently to prevent any 
interaction of errors (in a burst not larger than the design maximum), 
we allow the use of a simple circuit for recognizing the parity words and 
correcting the errors one at a time. In other words, the decoder for our 
example of Section II is simple because we have a single circuit for cor- 
recting a data-digit error, which is time-shared by every data digit. 
Each data digit is compared with four other digits, all far enough away 
from each other so that not more than one of these five digits can be in 
error due to an allowable burst. Under these circumstances, it is an easy 
matter to decide if the particular data digit needs correcting. 


VI. LOWER REDUNDANCY CODES 


We can apply the same technique, spreading out the parity words 
with 0’s so as to avoid interactions between errors in a burst, to make 
the redundancy as low as we wish and still have a relatively simple 
correcting mechanism. For instance, if we desire a code with a redun- 
daney of one-quarter good for bursts of length 4, we choose the follow- 
ing parity words (the digits are spaced to emphasize the method of 


forming ): 


Fs 111 000 000 O 
Ps 000 101 000 O 
P, 000 OOO 110 O 
Pp 000 000 000 1 


Note that placing the code 100 to the extreme right allowed us to shorten 
the parity words by dropping the last two columns, which were all 0’s. 
In assigning the parity words, the groups of 1’s should be arranged to 
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go from upper left to lower right as shown above. That is, the order of 
the groups of 1’s in the parity words should agree with the order of the 
digits in the block structure. If this is not done extra columns of 0’s 
must be inserted in the array of parity words, which would mean more 
shift register stages in the encoder and decoder. The difficulty is illus- 
trated as follows: 


Code I Code II 
P, lil 000 O00 O 000 111 O00 O 
Ps, 000 101 O00 O 101 000 O00 O 


a * 
Burst: --» ABCDAB .-.-. 
P. 1110000000 0001110000 
P;! 0001010000 1010000000 


Syndrome: 1110101 10011 
[O.K.] [N.G.] 


* * 
(The burst --- ABCDAB .--- is not allowed, since the above codes are 
for bursts of length 4 or less.) If we wish to make a code of the same 
redundancy good for bursts of length 8 or less, we form the parity words 
by inserting 0’s between each of the digits of the above code, giving: 


ar 10101 = Q00000 000000 00 
Ps, 00000 = OL0001 O00000 00 
Pe 00000 =O00000 010100 O00 
Pp 00000 000000 000000 O01 


Fig. 5(a) shows an encoder for this code. The data digits enter from the 
left side and are cyclically switched to the three shift registers by the 
input commutator. The buffers allow the shift registers to be stepped 
together. A check digit, calculated from the parity of the indicated posi- 
tions of the shift registers is transmitted with the digits from the last 
positions of the registers by the output commutator. Note the corre- 


spondence between the parity words and the shift registers. The top 


register comes from P, , the second from Ps , and so on. Each digit of 
a parity word becomes a stage of shift register. The stages representing 
1’s are connected to the parity circuit; those representing 0’s are not. 
If Pp , which has a single 1 at the right end, is assigned to the check 
digit, no shift register is required at the encoder for this parity word. 
(However, one is needed in the decoder.) 

At the decoder, Fig. 5(b), the incoming digits are commutated to the 
four shift registers (synchronization must be maintained so that the 
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digits get in the proper registers.) As each block arrives at the decoder, 
the parity relation is checked; if it fails, a 1 is put in the syndrome reg- 
ister at the bottom of the figure. Suppose that a digit in the top register 
is wrong; it will cause parity failures as it goes through Positions 1, 3 
and 5. When it is in Position 5 the syndrome register will have 1’s in 
Positions R, S and T. This will enable the AND circuit, which will cor- 
rect the error as it shifts from Position 5 to Position 6. In a similar man- 
ner, an error in the second register is corrected between Positions 11 
and 12, and an error in the third register between Positions 17 and 18. 
Whenever a 1 reaches Position 7 of the syndrome register, any 1’s in 
Positions R and S are cleared on the next shift. (If for some reason it 
should be desirable to correct the check digits rather than discard them, 
this can be done by adding the extension to the bottom register shown 
dotted.) Because of the way the taps for the parity circuit are spaced, 
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Fig. 5 (a) Encoder; (b) decoder. 
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any register can correct two adjacent errors, and the system is good 
against bursts of length 8 or less. 

It takes 92 digits entering the decoder to completely refill all the 
registers (including the syndrome register). Thus, a guard space of 91 
good digits between bursts is sufficient to assure that there is no inter- 
action between bursts. 

In general, a procedure for constructing a code of redundancy 1/) 
(block length b) is as follows: 


Take the first b binary numbers and let L be the number of digits 
in the largest one. Form each of these numbers to a L-digit word 
by adding zeros to the right end. Now form a square array with b 
rows and b columns. The entries in the array are L-digit words. Put 
the above-formed words along the main diagonal, with the word 
having a single 1 going in the lower right corner. The order of the 
other words on the diagonal is arbitrary. Fill in all remaining words 
with zeros. Now replace the right-hand column of L-digit words with 
single-digit words; | in the bottom row, 0 elsewhere. (Strike out the 

(L — 1)-digit columns from the right.) 

The rows of this array are the parity words of the desired code. The 
order from top to bottom is the (‘‘snapshot”’) order of occurrence of the 
corresponding digits in the block structure of the message.* This code 
will correct all bursts of length b or less. To make a code good for burst 
of length Kb or less, add K — 1 zeros between each adjacent pair of 
digits of the above parity words. 

If the odd binary numbers were not increased to L digits as above, 
certain otherwise allowable bursts could cause syndromes which would 
be incorrectly interpreted by the decoder. For instance, 001 and 110 
might add to form 001110. The procedure given prevents this type of 
difficulty. 


VII. SHIFT REGISTER AND GUARD SPACE REQUIREMENTS 


If we design codes by the method indicated in the previous section, 
the method is regular enough to allow us to give formulas for the shift 
register stages and guard space. t 

Definitions: 

b, l, k, L(b) are positive integers. 

* The bottom row is the parity word for the digit that is transmitted first in 
any block. See the direction of rotation of the output commutator in Fig. 5(a). 

t If the block length is not a power of 2 it may be possible to save a little from 
these calculations by taking advantage of the fact that one or more of the odd 


numbers is not used. For example, the burst-length 3, redundancy one-third de 
coder shift registers can be reduced from 24 to 21 stages 
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The block length is b with one check digit; hence, the redundancy is 
1/b. The code is to be usable against bursts of length J or less, where 
1 = kb. L(b) is the smallest integer such that 


L(b) = 1 + loge b. 


Thus, 
b] 234567 8 9 10 
L(b)| 23344445 5 
The eneoder will have b — 1 registers, each of length 
l 
-)L(b)(b—1) 4+ 1 
b 
or 


(b — »*() L(b) +b-—1 


stages. The decoder will have 
l 
l(b — 1)L(b) + 2b + j Lib) —1 
) 


stages. 


The guard space can be shown to be 
bIL(b) +b —1— 1. 


Table II shows some typical values. (Efficiency is 1-redundancy.) 


VIII. DETECTION OF OVERLONG BURSTS 


Any error-correcting system fails when the errors get beyond its cor- 
recting capabilities. In some cases, it is desirable to detect that a burst 
has occurred which the system cannot correct. In Section II we men- 
tioned that the code for that example failed on a burst consisting of two 
check-digit errors exactly six apart. The effect of this burst is to cause 
the decoder to change the data digit which is common to the parity 
groups containing these check digits. This is a fundamental difficulty of 
the particular code and cannot be avoided, since such a burst converts 
our encoded message to what looks like a different encoded message 
with a single data-digit error. In general, we cannot correct or even 
detect bursts which convert one encoded message to another encoded 
message, or to another encoded message modified by an allowable burst 
(assuming we still wish to correct allowable bursts). For the example 
of Section II there are: 
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four bursts of length 9, 
two bursts of length 8, 
one burst of length 7, 


which convert one encoded message to another encoded message mod- 
ified by an allowable burst. These seven bursts are not detectable, but 
all other bursts of length 9 or less are detectable and, of course, all bursts 
of length 6 or less are correctable. In connection with the code demon- 
strator described in Section II, a circuit which detects overlong bursts 
by monitoring the sequence of failures of the two parity circuits in Fig. 
3 has been built. All 512 bursts of length 9 or less have been tried on it, 
and it rings the alarm on all the uncorrected bursts except the seven 
listed above. 

In the code of Section II the syndrome for a single data error is the 
same as the syndrome for a pair of check-digit errors six apart. To im- 
prove the error-detection capabilities of our code, we can always change 
the parity words so that the first occurrence of two bursts giving the 
same syndrome involves bursts longer than the correctable one. 

As an example, consider the code given by the parity words 


Py» = 1001001000000, 
P¢ = 0000000001001. 


This particular code permits a very simple overlong-burst-detection 
scheme. It corrects or detects all bursts of length 13 or less and corrects 
all bursts of length 6 or less. The encoder and decoder are shown in 


TABLE IT 








| | } Shift Register Stages 


Block Rgongn A Burst Length ——— —| Guard Space 
Encoder Decodert 
2 | 50 2 | 3 8 7 
1 5 12 13 
6 7 | 16 19 
| 8 9 20 25 
| 10 | 11 24 31 
| | 
3 | 67 3 14 24 26 
6 26 42 50 
| 9 38 60 74 
4 | 75 | 4 30 | 13 17 
| 8 57 78 91 
| 
5 | 80 5 68 89 99 


10 132 168 194 
+ The decoder here is different from the one for Table I; this one has a syn 
drome register. 
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rig. 6. Note that, in the process of correcting an allowable burst, the 
three parity-check circuits R, S and T cannot have either of the fail- 
ure patterns 010 or 101. It happens that one or the other of these pat- 
terns occurs for every burst of length 13 or less that is not corrected by 
the decoder. 

The decoder here shows an alternative arrangement compared to 
Fig. 5. Instead of the syndrome shift register, we have three copies of 
the parity check circuit shifted so that Position 7 of the data register 
is the only one common to all three. If we were to make a circuit similar 
to Fig. 5, we would save the S and 7 parity circuits, the last five stages 
of the data register and the last six stages of the check register in ex- 
change for a seven-stage syndrome register with its reset circuit. 


IX. SYNCHRONIZATION 


In general, there are two synchronization problems with any binary 
code, bit synchronization and block synchronization. For purposes of 
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Fig. 6 — (a) Encoder; (b) decoder. 














RECURRENT CODES 983 


TABLE III — ImMposstsLE CopEs 


Number of Terms 


ODD EVEN 


Number of Ones ODD 00000: - - 0000---, 11111--- 
EVEN 11111--- 


bit synchronization, it is desirable to have transitions from 0 to 1 or 
1 to 0 at some minimum rate. By proper choice of the number of terms 
in the parity relation and also whether the parity is odd or even we can 
prevent either of the codes 00000--- or 11111--- from occurring in the 
encoded message, regardless of what the input to the encoder may be. 
It is probably also desirable to exclude these codes as possible encoded 
messages since they are the most likely messages to be put out by an 
encoder with something stuck on or off. Table III shows which codes 
cannot occur. 


TABLE IV 


Redundancy Burst Syndrome 
} Ae 111 
B° 001 

A°B® 110 

B'A? 101 

1 Ae 1100 
B° O11L] 

Cc 1101 

D° 1001 

A°®B® 1011 

BC? 1010 

C*p® | 0100 

D'A° | 1111 

} A° | 01100 
| B° 10111 

| Cc 11100 

p° | 01101 

Dy 10010 

Fe 11101 

G?° | 11001 

H° | 10011 

A°B® 11011 

BC? 01011 

CD° 10001 

D°E® 11111 

EF 01111 

F°G? 00100 

G°H? 01010 


H!A° 10101 
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With the redundancy one-half code, the block synchronization prob- 
lem is minimized, since there are only two phases that the decoder can 
have. One possibility is to make a decoder with two equal shift registers 
and two copies of the error-correcting circuit, one wired in each of the 
possible phases. The fact that the wrong parity circuits fail half of the 
time can be used to tell which phase to use. 
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APPENDIX 


Examples of Close-Packed Codes 


All of the examples known to date are for burst length two. It is not 
too hard to show that there is no close-packed code of redundancy one- 
half good for burst length three (see Table IV). 
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Representation of Switching Circuits by 
Binary-Decision Programs 


By C. Y. LEE 


(Manuscript received May 8, 1958) 


A binary-decision program is a program consisting of a string of two- 
address conditional transfer instructions. The paper shows the relationship 
between switching circuits and binary-decision programs and gives a set of 
simple rules by which one can transform binary-decision programs to switch- 
ing circuits. It then shows that, in regard to the computation of switching 
functions, binary-decision programming representation is superior to the 
usual Boolean representation. 


I. INTRODUCTION 


In his 1938 paper,' Shannon showed how relay switching circuits can 
be represented by the language of symbolic logic and designed and 
manipulated according to the rules of Boolean algebra. This far-reaching 
step provided an algebraic language for a systematic treatment of switch- 
ing and logical design problems and provided a root system from which 
new art can grow and flourish. 

We may want to know, however, if there might not be other ways of 
representing switching functions and circuits, and to compare such repre- 
sentations with the algebraic representation of Shannon. In this paper 
we will give a new representation of switching circuits, and will call this 
representation a “‘binary-decision program.” 

Binary-decision programs, as the reader will see, are not algebraic in 
nature. They are, therefore, less easily manipulated. A switching circuit 
may be simplified not by simplifying its binary-decision program, but 
by essentially finding for it a better binary-decision program. A good 
binary-decision program generally means one which is well-knit and 
makes efficient use of subroutines; it is good in the sense then that a 
computer program is good. Binary-decision programs do not seek out 
series-parallel circuits, but are more suited for representing circuits with 
a large number of transfers. In these respects, binary decision programs 
therefore differ very greatly from the usual Boolean representation. 
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Fig. 1 Typical switching circuit. 


The characteristic which sets binary decision programs still further 
apart from Boolean representation and gave this study its initial stimu- 
lation is in the computation of switching functions. It is here that we will 
give direct evidence of the superiority of binary-decision programs. 


Il. STRUCTURE OF BINARY-DECISION PROGRAMS 
A binary decision program is based on a single instruction 
T 2«:A,B. 


This instruction says that, if the variable x is 0, take the next instruction 
from program address A, and if x is 1, take the next instruction from 
address B. Every binary-decision program is made up of a sequence of 
instructions of this kind. 
Take, for example, the switching circuit shown in Fig. 1. This circuit 
is described exactly by the following binary-decision program: 
Be 
T 
T 
we 
5. T 
The program is actually a sequential description of the possible events 
that may occur. We begin at program address 1 by examining the vari- 
able x. If x should be 0, we go to address 2 and examine y. If y is 0, we 
go to address 6; otherwise we go to address 3, and so forth. The symbols 
6 and J indicate whether the circuit output is 0 or 1. From a computer 
viewpoint, they can be the exit addresses once the circuit output value is 
known. 


III. CONSTRUCTION OF SWITCHING CIRCUITS FROM BINARY-DECISION 
PROGRAMS 


The question that we will consider here is this: Suppose the logical 
requirements of a switching circuit are given, when would it be possible 
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to design the circuit according to the following procedure: 


Binary 
Logical sys 
— decision a id 
requirements circuit 
program 


Switching 


In various examples we have tried, this approach has given us a fresher 
look at things and, in several instances, has given us rather good cir- 
cuits. The process of going from binary-decision programs to switching 
circuits is very well defined, so that how good a circuit we get depends 
entirely upon how good a binary-decision program we can write. Roughly 
speaking, if a problem has a fairly sizable set of logical requirements to 
begin with, it would call for a well-organized array of subroutines in the 
binary-decision program, which are called in as the need arises. Never- 
theless, there are many exceptions, and it is very hard to say where 
ingenuity ends and routine process begins. 

Let us now state the rules on how a switching circuit can be con- 
structed from a binary-decision program. 

Rule 1. Each address of the binary-decision program corresponds to a 
node of the circuit. 

Rule 2. If at address A the instruction is T «x; B, C, a variable 2’ 
should be connected between nodes A and B and a variable x should be 
connected between nodes A and C. 

Rule 3. The node corresponding to address 1 is the input node. The 
node corresponding to address J is the output node. 

A simple change in Rule 3 will enable us to get the negative of the 
switching circuit. This is done by making the output node the node 
corresponding to address @ rather than to address J. 

We will illustrate our procedure by two examples. 


3.1 Hxample 1 


We wish to design a circuit with six switching variables, a,b,c and 
x,y,z. Let M be the binary number abe and N be the binary number xyz. 
Then the output is to be 1 whenever M 2 N. 

The problem says that two binary numbers M and N are to be com- 
pared; these two numbers may be compared one bit at a time. We may 
compare the most significant bits a and z first. If a = 0 and x l, 
then M < N, so that there is no output. If a = 1 and x = 0, then 
M > N, and the output will be 1. If a = 0 and x = 0 ora = 1 and 


x = 1, the comparison process must be continued to the next pair of 
bits. 
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A2 


Fig. 2 — Switching circuit for Example 1. 


The program proceeds by first comparing the pair of variables a and 


x. Depending on the values of a and x, b and y are compared next and, 


lastly, c and z, if necessary. The program instructions are 
Address Instruction 
A a; Al, A2 
Al x; A3, 0 
A2 2: TAS 
A3 b; A4, A5 
A4 y; AG, 0 
A5 y; I, A6 
A6 c; A7, I 
A7 pe Pe 


Following the three rules, we begin with the first instruction and let 
A correspond to the input node of the circuit. A branch labeled a’ leads 
to the node Al and a branch labeled a leads to the node A2. Al and A2 
thus become internal nodes of the circuit. From Al we need to put down 
only the branch 2’ leading to internal node A3, since a branch labeled 
x would give no output. Continuing in this way, we get all the paths 
from the input to the output; the addresses tell us where the interconnec- 
tions between these paths are to be made. The circuit for this example 
is given in Fig. 2. From this circuit it can be seen that the three circled 
variables are superfluous and can be deleted. 


3.2 Example 2 


We wish to design a switching circuit with eight variables, a,b,c,d and 
w,x,y,z. Let L be the number of the variables a,b,c,d which are in the 
0 state and let R be the number of the variables w,z,y,z which are in 
the 0 state. Then the output is to be 1 whenever L 2 R. 

The problem tells us that, if all of the variables a,b,c,d are 0, the 
output would be 1 regardless of what w,a,y,z are; if exactly three of the 
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rariables a,b,c,d are 0, then the output would be 1 whenever three or 
fewer of the variables w,z,y,z are 0; and so forth. To program this 


problem, we will therefore begin with the following subroutines: 

Si. The output is to be 1 whenever three or fewer of the variables 
w,x,y,z are 0. 

S2. The output is to be 1 whenever two or fewer of the variables 
w,x,y,z are 0. 

S3. The output is to be 1 whenever one or fewer of the variables 
w,x,y,z is 0. 

S4. The output is to be 1 whenever none of the variables w,z,y,z is 0. 

The program is then completed by counting the number L of the 
variables a,b,c,d which are 0. If LZ is 4, the output is made | directly; 
if L is 3,2,1 or 0, the program enters subroutine S1, S2, S3 or S4 respec- 
tively. 

We will begin with the subroutine programs. The subroutine S1 is a 
successive scan of the states of the variables w,x,y,z: 


Address Instruction 


S1 w; S11, I 
Sil x; $12, 1 
S12 T y;813,1 
S13 2; 6, I. 


The subroutine S2 can be written likewise. A moment’s reflection will 
show, however, that S2 can make use of a portion of the instructions of 
S1. Similarly, S3 can make use of S2 and S4 can make use of S3. The 
subroutine programs come out to be: 


Address Instruction 


Sl " w; S11, J 
S11 " 2: 812.1 
S12 y; S13, I 
S13 HET 

S2 w; S21, S11 
S21 x; S22, S12 
S22 y; 6, S13 
S3 w; S31, S21 
S31 x; 6, S22 
S4 w: 0, S31. 
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The main program which evaluates LZ and selects the appropriate 
subroutine is 


Address Instruction 
A a; Al, A2 
Al b; A3, A4 
A38 c; Ad, Ab 
A5 d;I,S1 
AG d; Sl, S2 
A4 c; A6, AZ 
A7 d; S2, S3 
A2 b; A4, A8 
A8& c; A7, AY 
AQ d; S83, S4. 


Following the three rules of construction, the final circuit is given in 
Fig. 3. To show how the subroutine circuits can be combined in stages, 
the circuit for S1 is given in Fig. 4 and the combined circuit for Sl and 
S2 is given in Fig. 5. 

Generally, we find that the switching circuits constructed from binary- 


decision programs have several distinct characteristics. The construc- 


tion does not distinguish among series-parallel, bridge and nonplanar 
circuits, but it is restricted to unidirectional flow of current in any 
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Fig. 3 — Switching circuit for Example 2. 
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Fig. 4 — Cireuit for 81 of Example 2. 


branch. Because of the transfer characteristic of the program instruc- 


tion, the program in most.cases gives bridge or nonplanar circuits and 
very rarely gives series-parallel circuits. Again, because of the transfer 
characteristic of the instruction, the procedure tends to give circuits 


having a large number of transfers, causing unnecessary appearance of 
variables in the circuits. On the other hand, the presence of the transfers 
prevents sneak paths, which are often a source of worry. 


IV. COMPUTATION OF SWITCHING FUNCTION BY BINARY-DECISION PROGRAMS 


The problem that we wish to consider here is this: Suppose, in carry- 
ing out a complicated task, a complex decision depending on many 
variables is to be made and made repeatedly. Question: What procedure 
should one follow so as to arrive at the decision quickly and without 
having to go through a large amount of computation? 

To make the problem more tractable, let us say that the decision 
function is a switching function of n variables. The problem is to find 
a good procedure for the computation of this function. 























ae 
eee eee 


Fig. 5 — Combined circuit for Sl and S2 of Example 2 
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The first question one should ask is, perhaps, what are the choices? 
If the switching function is, say, 


a(y’z v yz’) v x’y2, 


what alternatives in computation are there? 
There are indeed many alternatives. One may, for instance, carry 


out the following direct computation: 

1. y AND 2’, store result in location a; 
2, y’ AND z, store result in location b; 
3. (contents of a) OR (contents of b), store result in location a; 
f. 


x AND (contents of a), store result in location a; 

xz’ AND y, store result in location b; 

6. z AND (contents of b), store result in location b; 

7. (contents of a) OR (contents of b), store answer in location b. 

Or, one may go back to the switching function itself and rewrite it as 


(y v z) [x ® (yz)l, 


where @ (called SUM) stands for addition modulo 2. A direct computa- 
tion now becomes: 
1. y OR z, store result in location a; 
2. y AND z, store result in location b; 
3. « SUM (contents of b), store result in location b; 
4. (contents of a) AND (contents of b), store answer in location b. 
We have done in four steps what took seven above. 
Finally, we may write for this switching function a binary-decision 
program: 
LT 2:3,4. 
Tr #3. 
rT #;06,f. 
. 2 8 Oyeyo. 
SF epee 
This computation scheme differs from the other two in one major 
aspect, namely, the number of steps one needs to go through in the 
binary-decision program before one gets an answer depends upon the 
initial values of the variables. For example, we will arrive at an answer 
in two or three steps according as (xyz) = (001) or (xyz) = (110). To 
compare this program with the second procedure (with AND, OR and 
SUM), therefore, let us sum over all eight combinations of the three 
variables. This yields a total of 22 steps for the binary-decision program, 
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and 32 for the AND-OR-SUM procedure. On the other hand, the binary- 
decision program is longer by one instruction. 

The questions that we want to answer here are these: 

1. For an arbitrary switching function of n variables what is the 
order of magnitude of the number of instructions in its binary-decision 
program? 

2. Comparing specifically the binary-decision program approach with 
the AND-OR-SUM procedure, which will in general need fewer instrue- 
tions and which will need less time to execute? 

Questions of this nature but pertaining to the number of relay contacts 
or electronic components have been studied by Shannon? and Muller.* 
As is the case with their investigations, we are not able to answer these 
questions for individual functions but our answers apply to an over- 
whelming fraction of switching functions of n variables. 


4.1 Computation by Binary-Decision Programs. 


Let f be a switching function of n variables, and let u(f) be a number 
such that no binary-decision program representing f has fewer than u(f) 
instructions. We will let u, be the smallest number of instructions suffi- 
cient to represent any switching function of n variables. That is, 


Mn = max{u(f) | fe F(n)}, 


where Fn) is the set of all switching functions of n variables. Then 

Lemma 1: pn > 2"/2n. 

Proof: Let N(n,p) denote the number of possible binary-decision 
programs involving n variables with p instructions. Since each instruc- 
tion can be chosen from at most np’ instructions, it follows that 
N(n,p) S (np*)” and, in particular, N(n,u,) S (np,?)**. 

Now suppose u, S 2"/2n. Then 


e2n 2"/2n 22n\ 2"/2n 
a — —_— ‘ on 
N(nyun) S (rn — = - * 
4n? 4n 


and the lemma follows. 

To find an upper bound for yu, requires an interesting subroutine 
technique. Let a set of programs each of which computes a switching 
function be called a library. Let L(n) be the library of programs which 


ot . e ° ° . la hl 
represent all 2? switching functions of n variables. Then 
Lemma 2: The library L(n) can be written so that it contains not 


9 


"mn. . 
more than 2? instructions. 
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Proof: For n = 2, the 16 functions of two variables are 
Lae: 9. x22’. 
10. Xy'X0'. 
ll. a V 2. 
12. x,’ Vv 2. 
13. a V ae. 
14. 2,’ v 2’. 
15. 2,'t%o V 24%’. 
LyX. 16. 2% V 2y'22'. 
(2) can be the following program: 
r1 ; 9, 8. 0% 7 2; 9,6. 
38 1. mo. a3 68. 
m;6, 1. 1. Fe mh, 2. 
te, 2; 7 eet A 
a; 2, @. a. P meet. 
a2 5 I, 0. 4.7 2;f,6. 
x, ; 0, 4. is. 2 2 + 4,6. 
T 234, 0. 16. T a: 6,4. 


9 


Therefore, L(2) can be written in exactly 2? instructions. 
Now suppose the lemma is true for all n, 2 S n S m. Consider n = 


m + 1. The library L(m) by hypothesis has not more than 2?” instrue- 


92" 


° . . . ° a . 
tions and covers all functions of m variables among the 2 functions 


. . . . gmt on™ ¢ . . 
of m + 1 variables. For each of the other 2? — 2? functions of m + 1 
variables, the program can be written 
‘iM Lm+1 5 A, B, 


where A and B refer to addresses in the library L(m). Hence L(m + 1) 
can be written with not more than 
gmt r o™ oo 
(22 2") +4 

instructions. This proves the lemma. 

Using Lemma 2 and combining it with Lemma 1, we have 

Theorem 1: For all n, 

1 og” 


= < ia < 4 
27 


Proof: The lower bound was given by Lemma 1. To get the upper 
bound, let us write n = (n — j) + j, where j may vary from 0 to n. 
Let f be a switching function of n variables. Then f may be expanded 
about n — j of its variables in its canonical expansion: 

(ay wie Be 23 = X1'Xo! oa ze; f(0,0, lac 0, 41 ear os ~) Vv 


V Zhe °* z;f(i,1, ih 1 Djs ge Bin). 
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Now, we may write a program with exactly 
1+24+24+.--+ 2H = 2i-] 

instructions to give all the 2’ functions of 7 variables 
Hy Xe ty +++ 2, ++ , Uys +++ 2;. 


Also, by Lemma 2, we may construct a library L(n — j) with not more 
on" ae ° ° 
than 2? ~ instructions. Hence f can be programmed with not more than 
‘ ee | 6 . . y 
2? + 2’ — | instructions. Now set 
j =n — [loge(n — logs n)I, 
where [x] denotes the largest integer less than or equal to x. Then 
Qn Qn Qn 


2 - 
n — loge 


es . 


= < 
- ep lloga(n logen)]) = 


for n= 4. 
Therefore, for n = 4, 
- n bd fjo"—-) + 9) _ 1 
Ka = un 14 v4 


Now, by direct computation, we find uw, = 1, we. S 4 and yw; S 6. Hence 
for all n, we have p, S 4 (2"/n) — 1, and the theorem follows. 
Theorem 2: Given any ¢,0 < ¢ < 1,a fraction 1 — 2 2” of switching 
functions of n variables will need at least 2"(2n)-(1 — e) binary- 
decision program instructions to program. 
Proof: The number of possible binary-decision programs with not 
more than 2"(2n)-! (1 — e€) instructions cannot exceed 


( on 2\ 2"(2n)— l(1—e 
<2n E (1 — | > , 
\ alt 


which is less than 2?°"-®, Therefore, the fraction of switching functions 
of n variables which need not more than 2"(2n)~"(1 — e) instructions to 
program cannot exceed 2-@". Hence the rest must need at least 2"(2n)~ 
(1 — e) instructions to program and the proof follows. 

The procedure outlined in the proof of Theorem 1 yields for each 
switching function of n variables a binary-decision program. We will 
vall this program the normal binary-decision program for that function. 
A close examination of this procedure will show that the number of 
program instructions executed in the computation of a particular value 
of any switching function of n variables never exceeds n. That is, in 
the computation no variable is examined more than once. Therefore, 
we have 
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Corollary 1: The number of instructions which has to be executed in 
the normal binary-decision program for the computation of each value 
of any switching function of n variables never exceeds n. 

These results together give us a fairly good idea of how efficient it 
is to compute switching functions with binary-decision programs. For 
n = 20, for instance, practically all switching functions need more than 
25,000 instructions to program, although none needs more than 200,000 
instructions. The number of instructions that one needs to go through 
to compute a single value is never more than 20, however. We want 
now to compare these results with the AND-OR-SUM procedure men- 
tioned earlier. 


4.2 An Alternative Procedure 


Before we consider the AND-OR-SUM procedure illustrated pre- 
viously, it might be well for us to show why this particular procedure 
is chosen for comparison. A switching function is commonly written in 
terms of its variables and their complements connected by AND and 
OR. Besides AND and OR, there are eight other binary operations, 
denoted by 1, |, ®,, D, C, Dp, and ¢, where we have called @ 
the SUM operation. These can be written in terms of AND and OR 
operation : 


q yoo) vy’. zDy=2 vy. 
zly=2'y. rzCy=a2vy. 
eQ@y=axyv zy’. Dy = ay. 
rery=ayv wy’. a Gy =2'y. 


In order not to be restrictive with our alternative computational 
procedure, let us be allowed to use any of these 10 operations in a com- 
putation. The first thing we wish to show is that we lose nothing by 
throwing away seven of these operations. In order to do this, let us 
vall any switching function expression involving the variables and their 
complements, in which any of the 10 operations may appear, a binary 
expression. Let us also say that two binary expressions are equivalent if 
they represent the same function. Then 

Theorem 3: Let f be a binary expression with r operations. Then there 
is an equivalent binary expression g having r or fewer operations such 
that the only binary operations appearing in the expression g are AND, 
OR and SUM. 

lor example, the expression 


(’ py) | le’ @w)/w') 
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is equivalent to the expression 
(x v y) [w’(z @ u)] 


in which no operation other than AND, OR and SUM appears. 
To prove this theorem, we note that, if 9 is any binary-expression of 
r operations, r = 1, then g is expressible as 


where h and k are binary-expressions each of (r — 1) or fewer operations 
and * is one of the 10 binary operations. Also, 


gj = (he ky) =h*' k, 
where +’ is again one of the 10 binary operations. Therefore, if 9 is any 
binary-expression with r operations, then its complement expression @’ 
requires not more than r operations. 

Going back to Theorem 3, we note that the theorem is true for r = 1. 
Now suppose that the theorem is true for all r, 1 <r < R. Let f be an 


expression with R + 1 operations. Then f is expressible in the form 

f = @9* h, 
where g and h are expressions each with not more than R operations and 
* is one of the 10 binary operations. Since for the seven operations 
1,@, 3, C, p and Cwehaveg/h= 9 vh,g |h=A7R,Goh 
f@hgroh=Fvhgcoh=Gv ih ,gph= Gi andg¢h 
7’ h, it follows that f can be expressed as 


’ 


f=k-' mi, 
where *’ is one of the three operations AND, OR or SUM, & is either g 
or its complement g’ and m is either h or its complement h’. The theorem 
now follows from the previous assertion and the induction hypothesis. 
Because of Theorem 3, we may as well consider only those operations 
AND, OR and SUM in our computation procedure. To make specific 
comparisons possible, let us define a Boolean program as one that is 
made up of three kinds of Boolean instructions: 
AND A, B,C, 
OR A, BC, 
SUM A, EC, 
Here A, B or C refers to one of 4n possible locations in which values of 
the n variables and their complements as well as intermediate results 
may be stored. We have reserved 2n locations for the storage of inter- 
mediate results; this is sufficient for computing the most complex of 
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functions of n variables. To standardize location reference we will use 
locations 1, 2, --: 


, n for storing variable values; 1’, 2’,---, n’ for 
storing complement values; and 1”, 2”, --+ , (2n)” for storing interme- 


diate results. The Boolean program for the function 


[x ® (yz)] y v 2), 


for example, can now be written 


1. OR 2, 3,1”; 
2, AND 2,3, 2”; 
3. SUM 1, 27,2"; 
4. AND 1”, 2”, 2": 


+= 


’ 


where x, y and z values are stored in locations 1, 2 and 3 respectively; 


x’, y’ and 2’ in locations 1’, 2’ and 3’ respectively; and the final result 
is in 2” 


To each function f, let u(f) be a number such that no Boolean program 
which computes f can have fewer than v(f) instructions. Let 
v, = max fo(f) | fe F(n)}, 
where, as before, /'(n) is the set of all switching functions of n variables. 
Then 
Lemma 8: 


Sn 


Un 


3 logs n - g" 


Proof: Let M(n,p) be the number of possible Boolean programs with 
p instructions. Then 


M(n,p) S [3(4n)*]’, 


and 
M (nun) < (192n*)*", 
Suppose 
Qn 
v, <: a 
, 3 logsn + 8 
Then 


, a 1 a »3\~1 mo 
M (nvr) < (192n°)? (3log,n+8) < (192n*)? (log, 192n ) a). | 


and the proof follows. 
Lemma 3 affords us a comparison of Boolean programs and binary- 


decision programs. Combining Theorem 2 and Lemma 3, we see that, 
for n 2 64, the inequality 


Un < Up, 
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strictly holds, and it most probably holds for much smaller values of n 
as well. In fact, we see that, for large n, v, is bigger than yu, by an order 
of magnitude. It therefore follows that Boolean programs are in general 
much longer than binary-decision programs. Moreover, since for each 
computation every instruction in a Boolean program has to be executed, 
the difference in speed of computation becomes indeed astronomical. 

It has been amply clear that, although Boolean representation of 
switching circuits has been the foundation on which switching theory 
had been built, the inherent limitations in the Boolean language seem 
to be difficult hurdles to surmount. Boolean representation is algebraic 
and highly systematic, but so inflexible that it is powerless against all 
but series-parallel circuits. Moreover, as this paper shows, it is extremely 
inefficient as an instrument for computation. Binary-decision program- 
ming is our attempt of a way to get beyond these limitations. It works 
well for computation. Further studies will be required to find efficient 
ways of minimizing binary-decision programs and to make binary- 
decision programming an instrument for circuit synthesis. 
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A Method of Coding Television Signals 
Based on Edge Detection 


By BELA JULESZ 


(Manuscript received February 5, 1959) 


A method is described for transmitting digitalized video signals to reduce 
channel capacity from that needed for standard PCM. This method takes 
advantage of the inability of the human eye to notice the exact amplitude 
and shape of short brightness transients. The transmitted information con- 
sists of the amplitudes and times of occurrence of the ‘‘edge’’ points of video 
signals. These selected samples are coarsely quantized if they belong to high- 
frequency regions, and the receiver then interpolates straight lines between 
the samples. The system was simulated on the IBM 704 computer. The 
processed pictures and obtained channel-capacity savings are presented. 


I. INTRODUCTION 


There is an increasing trend in the communication field to utilize the 
physiological and psychological properties of the ultimate receiver - 
the human observer. Some of these properties were applied many years 
ago in establishing television transmission standards — for example, 
visual acuity and flicker-fusion frequency thresholds. The development 
of information theory made this trend even more apparent, particularly 
in Shannon’s first coding problem, where he posed the question of find- 
ing an optimum code for a continuous information source when the 
fidelity criterion of the receiver was given. 

Unfortunately the fidelity criteria of human observers are not known. 
This lack of knowledge is particularly apparent in visual processes, even 
though in this field the challenge of possible channel-capacity saving is 
tempting. From a theoretical standpoint, the solution of the first coding 
problem must be postponed until enough psychological data are col- 
lected. But, from a practical point of view, it is possible to overcome this 
barrier. Instead of searching for human fidelity criteria, we can proceed 
in the following simpler way. 

First, we take the present television pictures of toll quality as a stand- 
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ard. Then we process the signals of the television source by performing 
some reasonable operations which reduce information rate, and com- 
pare by mere inspection the resulting pictures with the standard pic- 
tures. If, for a well-chosen class of different pictures representing most 
of the possible cases, the results of statistical preference tests do not 
discriminate between the processed pictures and the standard toll 
quality pictures, we can regard the obtained rate of information as a 
practical upper bound. If we choose a processing which codes the con- 
tinuous source in binary digits, and assume an error-free binary channel 
for transmission (e.g., PCM as a good approximation), we can ensure 
that no further picture quality degradation will occur. Thus, the viewer 
will get the same quality of pictures he was accustomed to seeing, but 
with less channel capacity. 

As we see from the above considerations, the search for an optimum 
code becomes a trial-and-error procedure. The problem now is to find 
reasonable operations for the processing. They should be based on 
psychological facts or hypotheses and should not be too complicated for 
realization. In the last few years several ideas have been tried out along 
these lines, with more or less success.':?:* The complexity of the required 
instrumentation limited or prevented a thorough investigation of these 
ideas. However, the rapid development of general-purpose digital com- 
puters has made it possible to test new ideas without actually building 
equipment. We can simulate any system on a computer by writing a 
program which converts the general-purpose computer to a special- 
purpose computer. Special input and output transducers convert the 
input pictures to sequences of digitalized numbers and, after processing 
them, reconvert the output of the computer to pictures. Such equipment 
was developed by and is used now in the Visual and Acoustical Research 
Department of Bell Telephone Laboratories as a valuable research tool.‘ :5 
lor the processing we use an IBM 704 computer. Although at present 
we cannot perform the simulations of television coding schemes in real 
time on the existing computers, we can evaluate many aspects of a 
system’s performance without building it. Thus, it is possible to compare 
systems and choose the best one before actual realizations. 


II. PROPERTIES OF TELEVISION SIGNALS AND OF THE HUMAN RECEIVER 
RELEVANT TO EDGE DEFINITION 


This paper describes a system which transmits only certain points of a 
television signal, depending on some given signal properties, and, after 
reception, interpolates between the points according to a given law. 
Several similar systems are described which differ only in the criteria by 
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which the transmitted points are selected and coded, and in the func- 
tion used for the interpolation.*? In our case these criteria were chosen 
to match some properties of both the usual television pictures and vision. 

Television waveforms, in contrast to acoustical signals, include fast 
transients followed by horizontal or slowly changing sections, and are 
relatively poor in damped oscillations. Because of this, a recent system® 
which transmits only the extremals of acoustical signals and interpolates 
the output at the receiver according to a given law is not suitable for 
television. We tried out this system by simulating it on the computer 
and in the pictures on the left in Fig. 6 results are shown — that systems 
which perform well for acoustics may not work for vision. Furthermore, 
there is experimental evidence that the human eye is not very sensitive 
to the exact amplitude and shape of sudden brightness changes, but is 
able to locate the starting and ending points of these brightness transients 
fairly accurately. (The meaning of this property will be made clearer by 
quantitative results explained in the course of this paper.) Because of 
these properties of the source and of the ultimate human receiver, we 
chose to transmit only the end points of the brightness transients. Pro- 
vided the standard horizontal scanning technique is used, it is quite 
simple to give a mathematical criterion for selecting such “‘edge’’ points. 

To locate an edge it seems natural to require that some combinations 
of the first and higher order derivatives of the input signal should com- 
prise an extremum. Now, according to the sampling theorem, the least 
rate of discrete sampling points which determine a band-limited signal 
(limited to bandwidth W) must occur at the Nyquist rate (2W). These 
samples are enough to determine also derivatives of any order. If u(t) 
is the continuous band-limited input signal and is sampled at Nyquist 
rate, which yields u; (¢ = --- —2, —1,0, 1, 2, ---) samples, the samples 
u,’ of the derivative signal u’(t) are given by the following linear trans- 
formation : 


u’ = Au, (1) 
where 
u = (Wm, Us+** Uy); uw’ = (uy’, Us’ +++ Up’) 
and the elements of the transformation matrix A are 


: ( am e n 
Amm = 9; Amn = 20 i ae 


m— mn 


l‘or the processing on digital computers we get the input data in sampled 
and quantized form. As we see from (1), to compute only a first deriva- 
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Fig. 1 — Actual system used to transmit only certain points of television signal. 


tive would require a vast amount of computations, taking into account 
all sample values of the input signal. We can reduce the number of 
operations to a few subtractions if we introduce in place of the deriva- 
tives differences between sequential samples. If we now define an edge 
in terms of differences, we get a new system which resembles the previous 
one superficially, but in the microstructure (i.e., in the determination of 
a point within one Nyquist interval) the systems may differ consider- 
ably. We must not forget that, on account of human vernier acuity, 
ambiguities within one Nyquist interval 1/(2W) may be clearly visible. 
Because we cannot decide by mere speculation which of these systems 
will prove to be superior, we investigate the simpler one first. 


Ill. DEFINITION OF AN EDGE POINT 


The actual system is given in Fig. 1. The band-limited video signal 
u(t) is sampled at Nyquist rate, and the difference (A; = uj; — wis) 
between sequential samples is computed. A three-level quantizer with 
decision levels « and —e, and with representative levels 1, —1 and 0 
performs the following quantization: 


A; = 1, if A; €, 
A/ = 0, if |A;| < «, 
A,’ —1, ifA; S —e. 


Here the ¢ decision level has to be set experimentally. If it is too small, 
the operation will be affected by trivially fine structure; if it is too large, 
the fine details in the picture will be lost. 

Now we define a sample point as an edge point (@;) if the quantized 
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left- and right-hand differences (A;_;’ and A,’) of that point belong to one 
of the six cases, given by (3) and shown in Fig. 2(a): 


A;-'A/ < 0, 
Ais’ = OandA/ ~ 0, 
Avs’ # OandA,/ = 0; 
that is (in a more efficient notation), 
Ais’ # Ay. 


These cases refer to the local maxima or minima of the differences and 
to the end points of horizontal sections, provided the changes are above 
the « threshold. Sample points on monotonic increasing, decreasing or 
horizontal sections will be omitted. These nontransmitted samples thus 
fall in the next three cases, shown in Fig. 2(b). 

To select (from the nine possible cases) the six cases which correspond 
to an edge point, we have to perform the operations indicated in Fig. 1. 

The output of the quantizer is again delayed one sample period and 
subtracted from its undelayed form to obtain the difference of the quan- 
tized left- and right-hand differences. After the second subtraction we 
get OP = A,’ — Aj_1’, which is nonzero for samples for which we want 
to define an edge and is zero otherwise. 

The OP signal after full-wave rectification and limitation operates as 
a gating pulse and specifies which samples have to be transmitted. 

As the result of these operations we get samples at an irregular rate, 
the average of which is substantially less than the Nyquist rate. This 
average rate depends on the picture material and on the ¢ threshold. 
Because of the irregular occurrences of the selected samples we also must 


st 6 Ne 


(a) 


/ 


(b) 


Fig. 2 — (a) Six cases in which quantized left- and right-hand differences are 
not equal; (b) monotonic increasing, decreasing and horizontal sections. 
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Fig. 3 — Picture material before and after processing (finer threshold setting). 





CODING TELEVISION SIGNALS BY EDGE DETECTION 1007 


specify their positions in time. That requires additional information to 
be transmitted; thus, the saving in required channel capacity is less 
than the ratio between the Nyquist rate and the average rate of the 
edge points. The net saving depends considerably on the coding schemes 
we apply to transmit the values and locations of the chosen samples, as 
will be discussed later. 


IV. INTERPOLATION 


After we specify the criteria for selecting the samples, we have to 
decide on an appropriate interpolation function. Because the selected 
samples occur less frequently than at Nyquist rate, they can be con- 


sidered independently of each other. This means that there is no pre- 


ferred curve connecting the selected samples from a mathematical point 
of view. From a psychological standpoint, the eye is not sensitive to the 
exact shape of a short transient, and thus the simplest choice in that 
region is a linear interpolation function. Furthermore, the longer mono- 
tonic increasing or decreasing sections between two edge points can be 
convex or concave and, in the average case, the best interpolation is 
again the linear one. 


V. COMPUTER SIMULATION 


According to the above considerations a program was written to deter- 
mine the edge points by using the criteria given in (3) and to interpolate 
straight lines between them.’ The program also provided the statistics 
of the distribution of the distances between adjacent edge points. The 
time fluctuation of the selected sample rate also was recorded. 

The picture material before processing but quantized in time and 
amplitude is shown in Fig. 3 (middle column). The picture consists of 
100 lines, each containing 120 picture elements. For synchronization and 
blanking we used 15 picture elements in every line and the complete first 
line; thus the number of picture points is 99 K 105 = 10,395. This 
resolution corresponds to a television picture 3's the area of the present 
standards. That means that the given pictures have to be observed from 
five times greater distance to get the usual resolution. If we take four 
times picture height as the usual viewing distance for standard tele- 
vision, the presented pictures have to be judged from a distance of 20 
times picture height. The reason for the choice of this coarser resolution 
was a compromise between acceptable picture quality and computer 
storage capacity. The amplitudes were quantized into 10 bits (1024 
levels) between the white and blacker-than-black levels, and into 9 bits 
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Fig. 4 Picture material before and after processing (coarser threshold setting). 
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(512 levels) between the white and black levels, although 7 bits are 
enough for excellent quality.! The sampling and quantization were per- 
formed by an analog-to-digital converter, which could perform the op- 
posite operations as well. A slow-speed scanning system converted the 
pictures into electrical signals and back to pictures. The sampled and 
quantized signals were put on a tape which served as an input to the 
IBM 704. The processed output of the computer was written on tape 
too, and the same devices in reversed operation converted it into pic- 
tures. The pictures which are designated as “‘original’’ (Figs. 3 and 4, 
middle columns) went through all these devices, but the program of the 
computer was such that it copied the input tape unchanged onto an 
output tape. 

After we tried the processing with several e threshold values we got 
the surprising result that, although the number of selected samples 
increased with decreasing « values, the over-all appearance became 
worse. The most apparent defects were at vertical edges. The explana- 
tion of this effect is as follows: With decreasing ¢ thresholds the posi- 
tioning of an edge point at the endings of horizontal sections becomes 
very sensitive. A little change in slope can shift the edge points several 
Nyquist intervals apart (see points e, in Fig. 5). At a vertical edge each 
slope of the transients differs slightly from the one in the lines above (a 


small amount of added noise has the same effect), giving a very annoy- 


NEWLY ADDED 
SAMPLES 


Fig. 5 — Slight change in slope (as in upper curve) moves edge points « several 
Nyquist intervals apart. 
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Fig 6 — (left) Distorted pictures show that systems which perform well for 
acoustics may not work for vision; (right) output of system with only one fine 
threshold. 


ing fuzziness to the vertical edge. The right-hand pictures of Fig. 6 show 
the output of this system with a fine threshold setting (« = 3.6 per cent) 
and the above-mentioned defects are clearly visible. If we increase the «€ 
threshold, this sensitiveness to edge positioning decreases, but the 
quality of the pictures also decreases. The reason for this is that, by 
taking increased threshold settings, we get fewer selected samples, and 
thus fine details in the pictures will be lost. With small threshold values, 
we get phase errors in edge positioning. Therefore, « can be neither too 
small nor too large, and even the best compromise does not ensure ade- 
quate picture quality. 


VI. OBJECTIONABLE SENSITIVITY TO EDGE POSITIONING AND ITS 
CORRECTION 


There is a way to get rid of this annoying fuzziness and still be able to 
choose a value of ¢ that is small enough. If we take two threshold values 
(e, , €2) such that eK e2, we get two sets of edge points. We then take 
the union of these two sets. In most cases the set of edge points deter- 
mined by the finer threshold contains the set determined by the coarser 
threshold, and thus does not increase the number of selected points. In 


the few cases when that is not the case, the additional points help to cure 
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the sensitivity to small slope changes. In Fig. 5 we see that the edge 
points determined by ¢2 remain fixed in subsequent lines, and the phas- 
ing errors due to the edge points given by « have no effect on the over-all 
interpolation. 

We determined the number of selected samples for this system using 
the picture material shown in the middle column of Fig. 3. We chose 
€2 = 10 per cent (of the peak-to-peak value between black and white) 
and «, = 3.6 or 5 per cent. The increase of selected samples due to the 


additional samples determined by ¢2 depended on the picture material 


and was small (less than 11 per cent for pictures a, B, c and 17 per cent 
for picture p). The ratio of the selected samples to the total number of 
samples is given in Table I in per cent. 

To simplify the design of coding devices, we limited the maximal dis- 
tance to 16 Nyquist intervals. If, after determining an edge point and 
scanning further from left to right 16 Nyquist intervals, we did not find 
a next edge point, we selected a new sample 17 Nyquist intervals away 
from the previously selected sample. The frequency of occurrence of 
such a case is very small; thus, the increase due to these newly selected 
points is negligible. 

The foregoing process gives good results in nearly every case. In excep- 
tional cases, the pictures leave something to be desired. The reason for 
this and its correction are discussed next. 


VII. THE “TUNNEL EFFECT’? AND ITS CORRECTION 


The system discussed above selects the edge points by analyzing the 
quantized differences according to (3). If the difference between subse- 
quent samples is less than the e« threshold, we do not transmit any 
sample. Now the pictures may contain hill- or valley-like sections with 
slopes so mild that the left- and right-hand differences around the 
maximum or minimum are less than « , and thus we do not select these 
maximum or minimum points for transmission. The linear interpolation 
between the subsequent edge points looks like a tunnel, and if 4; — ¢; 


TABLE I RATIO OF SELECTED SAMPLES TO ToTaL (PER CENT) 
System Setting 
3.6 per cent; e 10 per cent) « 5 per cent; e 
32.§ 29.1 
30. 24.0 


34.4 28.3 
17 42.0 
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Fig. 7 — The tunnel effect between edge points @; and @; and its correction 
with ge andr = 3. 


(the distance between edge points @; and @,) is long enough, large errors 
can be committed (see Fig. 7). It is possible to correct this effect in 
many ways. We used the following procedure: We subtracted the 
original picture from the processed one to give the interpolation errors. 
A new threshold value ¢e; was chosen. Whenever the error exceeded this 
threshold at time, t,, the routine searched for the closest edge points 
left and right. If the distance, 7, on both sides was equal to or more 
than three Nyquist intervals, the routine selected the sample, u; , for 
transmission; if the distance was less, no additional samples were se- 
lected. Thus we left the errors uncorrected for short sections (less than 
six Nyquist intervals long), utilizing the same psychological effect; i.e., 
the eye is not sensitive to the exact value of brightness changes in short 
times. This last manipulation improved the picture quality further. The 
number of selected points in this system is given in Table II. The dis- 
tribution of the distance between the edge points for scene B is given in 
Fig. 8. Here, P; is the frequency of the distances between subsequent 
edge points, and the index refers to the distances in Nyquist intervals. 

By comparing Table I with Table II we see that the tunnel effect 
occurs very seldom, and that the increase in transmitted samples is 
negligible. The pictures obtained by this variant are shown in the left 
columns of Figs. 3 and 4. Fig. 3 corresponds to the finer threshold setting 
(e, = 3.6 per cent, e. = 10 per cent, «; = 5 per cent, r = 3 Nyquist 


Tasie Il — Ratio oF SELEcTED SAMPLES TO ToraL (PER CENT) 


System Setting 
| 
e1 = 3.6 per cent; e = 10 per cent;) e: = 5 per cent; eg = 10 per cent; 
es = 5 per cent; r = 3 €3 = 7.2 per cent; r = 3 

32.9 29.2 
31.3 | 25.3 
35.8 29.3 
47.3 42.4 
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Fig. 8 — Distribution of the distance between subsequent edge points in 
scene B. 


intervals); Fig. 4 to the coarser setting (e. = 5 per cent, e. = 10 per 
cent, €3 = 7.2 per cent, r = 3 Nyquist intervals). 

As we see, minor modifications in the program parameters improve 
the appearance of the pictures considerably. The statistics of the selected 
points for scene B are given in Fig. 8. Another advantage of choosing 
linear interpolation becomes apparent. As we add new points to the 
original edge points according to some different criterion, we need not 
label them separately because, in the case of linear interpolation, every 
received sample can be treated equally. 


VUI. EVALUATION OF PROCESSED PICTURES 


In Figs. 3 and 4 the left columns show the pictures processed accord- 
ing to the last variant. This variant, which includes edge determination 
by fine and coarse thresholds and tunnel-effect correction, we shall refer 
to simply as “linear interpolation.’”? As we see, small changes in the 
threshold greatly affect the number of selected samples and the picture 
quality. If we decrease the thresholds, the number of selected samples 
reaches an asymptotic value which, depending on the picture material, 
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is considerably higher than those obtained by « = 3.6 per cent. (For 
example, in scene B the asymptotic ratio of selected samples to the total 
is 49.3 per cent.) Nevertheless, the improvement of the processed pic- 
ture is very slight if « < 3.6 per cent. So, the setting «, = 3.6 per cent, 
e. = 10 per cent, es; = 5 per cent, r = 3 Nyquist intervals presents a 
good compromise between picture quality and information savings. 
The pictures taken with settings « = 5 per cent, e, = 10 per cent, 
e; = 7.2 per cent, 7 = 3 Nyquist intervals could be taken as another 


compromise, with an emphasis on economy rather than on quality. 
The reason that the picture quality does not improve much with de- 
creasing thresholds can be explained simply: The sensitivity of the eye to 


phase errors in locating the edge points within one Nyquist interval is the 
major cause of the deteriorated appearance of the processed pictures. 
This ambiguity within one Nyquist interval will not improve much as 
we set the thresholds finer. One way to get better results would be to 
specify the location of the selected edge points more accurately than one 
Nyquist interval. Even a modest oversampling of a factor of two would 
be advantageous. In the next section it will be apparent that this opera- 
tion will not increase the required channel capacity by more than 12 per 
cent in the worst case (scene p) but would be beneficial in locating the 
edge points within one-half Nyquist interval. 

The above-mentioned phase errors are most disturbing on the vertical 
edges of scene A and on the outline of the face in scene B. For more 
detailed material this effect is much less objectionable. 


IX. COARSE QUANTIZATION OF FAST-TRANSIENT REGIONS 


Some recent work has exploited the same psychological phenomenon 
(that is, the insensitivity of the eye to the amplitude and shape of sudden 
brightness changes) from a different approach.®'°"' These authors quan- 
tized the amplitude of the samples in the region of fast transients into 
fewer levels than for the rest of the picture. We can add this feature 
advantageously to the linear interpolation between the edges. The ob- 
tained benefits are complementary: For pictures with many fast transi- 
ents the number of selected samples is large, but these are just the 
samples which can be quantized in fewer number of bits. On the con- 
trary, for pictures with fewer details (thus with fewer fast transients) 
we have to specify the selected samples more accurately, at least by 
7-bit quantization. 

We incorporated this feature in the linear interpolation system in the 
following way: The selected samples were divided in two categories. 
The first category contained those edge points which were no more than 
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TaB.Le III — Ratio oF CoaRSELY QUANTIZED 
SAMPLES TO SELECTED SAMPLES 


System Setting 
Scene 


€1 = 3.6 per cent; e: = 10 per cent;| e:. = 5 per cent; e = 10 per cent; 
es = 5 percent; 7 = 3;¢ = 2 es = 7.2 per cent; r = 3;4 = 2 


0.71 0.63 
0.52 0.44 
0.59 0.54 


0.77 0.72 


two Nyquist intervals from the left and the right neighboring edge points. 
These points thus belonged to high-frequency regions and were quan- 
tized coarsely. In the experiments, 3- and 4-bit (8- and 16-level) quan- 
tization was tried out. The remaining edge points which were the end 
points or inner points of low-frequency regions had to be quantized into 
finer steps. We used here 9 bits (512 levels), as in the linear interpola- 
tion system, but 7 bits would probably be very satisfactory. A 3-bit 
quantization for the fast-transient region turned out to be very notice- 
able, but 4-bit quantization gives quite satisfactory results, as the right 
columns of Fig. 3 and 4 show. The ratio of the coarsely quantized sam- 
ples to the selected samples is given in Table III. Here ¢ is the parameter 
which defines the fast-transient regions and, as mentioned, was set for 
two Nyquist intervals. According to this setting, edge points falling in 
regions which contained oscillation higher than half of the maximum 
frequency of the signal were coarsely quantized. We might have in- 
creased ¢ even further from a psychological point of view, but the addi- 
tional reduction in channel capacity would have been slight. In the 
following section we evaluate the obtained statistics and give the chan- 
nel-capacity figures for possible coding schemes. 


X. CODING AND AMOUNT OF CHANNEL-CAPACITY SAVINGS 


After the processing of the pictures, the second step is a subjective 
evaluation of them. Provided we accept the obtained picture quality, 
the next step is to evaluate the information content and the obtained 
channel-capacity savings. Information theory enables us to get a theo- 
retical lower bound of the information content of the processed pictures, 
but to realize it even approximately requires very involved coders an 
decoders. Therefore, we also make computations with simpler coding 
devices. Such devices do not make use of the obtained statistics of dis- 
tances between edge points, but regard all possible distances as equally 
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probable. Because the greatest distance between selected samples is 
restricted to 16 Nyquist intervals, 4 bits are required to specify the 
location of a selected sample from its previous neighbors. To describe 
the amplitude of the selected sample, 7 bits are adequate. Thus, 11 bits 
are required to specify the location and amplitude of a selected sample 
point. In conventional systems, 7 bits are enough to specify the ampli- 
tude of samples occurring at Nyquist rate. 

Aside from the foregoing, the saving in the transmitted information 
obviously would be the ratio of the selected samples to the total number 
of samples. Because of the foregoing, the saving is diminished in the 
ratio of 11/7. If N is the total number of samples and N’ is the number 
of selected samples, then the average rate of information is 

om 

R = il v bits/sample. (4) 
The coder contains a time-variable buffer storage to smooth out the 
incoming signals, which arrive at an irregular rate, and to transmit 
them on the channel at a constant average rate. At the receiver, the 
inverse elastic operation is performed in the decoder. If N’/N < 7/11, 
we get a saving in information rate over the conventional Nyquist rate 
sampling. 

The rate of information is computed for the linear interpolation sys- 
tem with and without quantization. In the quantized case the required 
rate is 

N’—N 


] : N* 
R, = il N +8 NV bits/sample, (i 


ao 


where N* is the number of coarsely quantized samples. 

If we take advantage of the highly peaked distribution curve of the 
distance between selected samples, and use a Shannon-Fano code to 
encode them, the rate of information for the linear interpolation system 
without and with quantization is as follows: 


ee lle (6) 
d 
Ring = 4 +o: k + tS: (1 —k), (7) 
tf 
where 
N N* 
7 
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TaBLE IV — INFORMATION RATE FOR PROCESSED 
Pictures (Bits PER SAMPLE) 


System Setting 


3.6 per cent; e = 10 per cent; e: = 5 per cent; e2 = 


Scene es = 5S percent; r = 3; = 2 €s = 7.2 per cent; r 


10 per cent; 
= & ? 


| a -t “ie .} Be Rin Rin 
.92 
.94 
.27 
4.08 


22 | 3. 2. 2. 2.01 
45 2. 2.46 43 2.34 
64 | 3.2: 7 


bo 


bot 


2.76 2.28 
99 | 4. 3.7 3.72 2.80 


~ 


and 


16 
Hy = - > Pxi logs Px;. 
i=] 


The Px; are the frequencies of the distances between extremals of a 
given picture, X;R, R,, Rk, and R,,, are tabulated for different scenes 
and system settings in Table IV. The obtained information reduction is 
considerable, and it is an advantageous situation that the smallest 
entropy values, Hx , are obtained for the most involved pictures which 
require the most selected samples. 

If we use statistical coding (e.g., Shannon-Fano or Huffman codes), 
we have to use the same code for all scenes. If we choose the code ac- 
cording to a scene Y, and we have to encode a different scene X, the ex- 
pected code length in binary digits will be approximately 

16 
Lxy = —)) Px: log: Pyi, (8) 

t=1 
where Lyy is always greater than Lyy = Hy. To see how these values 
compare with the entropies, we computed them for the 16 possible com- 
binations of scenes using the finer threshold settings. Table V shows 
Lxy , which is not very sensitive to Y (i.e., to the particular code used). 


TaBLE V — Expectrep Copr Lenotu Lyy IN Bits 


y 


Scene 
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Fig. 9 — Fluctuation in time of number of selected samples at input of the 
buffer storage. 


The channel capacity has to be enough to transmit all possible pic- 
ture material. Because we do not know whether the selected few pictures 
are good representatives of all possible entertainment pictures, we can- 
not state theoretically anything definite about channel capacity, but we 
hope that the results are close to reality. If we regard the pictures as a 
whole, instead of as a 25th portion, and look at them from the usual 








7.4 
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four times picture height instead of from the distance of 20 times picture 
height, we can get an impression of how the quality would look for the 
most crowded scenes. 

The picture materials used were not far from the stationary case; i.e., 
entropy values calculated from statistics gathered across different lines 
of a picture did not fluctuate much. 


XI. BUFFER-STORAGE REQUIREMENTS 


In Fig. 9 we show how the number of selected samples fluctuates in 
time at the input of the buffer storage (curved lines). If we read out the 
data at a constant rate, we get a straight-line representation as a func- 
tion of time. If we choose this constant rate at the output as the average 
rate of the input, the straight line starts at the origin and hits the input 
curve at the end point. The maximal difference between the input and 
output curves gives a good estimate of the buffer-capacity requirements. 
The curves for scenes A and B are shown for the fine setting of the linear 
interpolation system without quantization or statistical coding. The co- 
ordinates are equivalent to time and are specified in terms of the number 
of scanned lines. The abscissae are the number of selected samples at the 
input and output of the quantizer, with the synchronization signals 
added. The requirement in storage capacity is about one scanning line 
(120 samples) for scene B and about four scanning lines for scene a. If 
an increased output rate (dashed straight line) is used, the storage- 
capacity requirement can be reduced. 


XII. SUMMARY 


The above-described experiments used the inability of the eye to 
notice the exact amplitude and shape of short brightness transients. By 
using straight-line interpolation between edge points and coarse quanti- 
zation of edge points in fast-transient regions, we can transmit informa- 
tion at a rate of 3 bits per sample or less for the given scenes and shown 
picture quality. If we take the present 7 bits per sample rate as a refer- 
ence, the greatest possible saving for scene p is 7/2.99 = 2.3 times, and 
for scene B it is 2.9 times. Naturally, with practical buffer-storage size 
we cannot average out the differences in information rate for the dif- 
ferent scenes, and we have to match the channel to the worst case. 

If we use additional information to specify the location of edge points 
within a Nyquist interval, the quality of the pictures will greatly im- 
prove. 

The obtained savings are modest and close to the figures achieved by 
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other authors. Probably the results are interesting more because of what 
they reveal of visual perception than because of their immediate en- 
gineering applicability. 
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Equilibrium Delay Distribution for One 
Channel With Constant Holding Time, 


Poisson Input and Random Service 


By PAUL J. BURKE 


(Manuscript received February 19, 1959) 


The equilibrium delay distribution is found for a single-server queueing 
system with Poisson input, random service and constant holding time. Curves 
are presented for various occupancy levels, and these are compared with their 
queued-service constant-holding-time and random-service exponential-hold- 
ing-time counterparts. 


In many situations involving waiting lines—for example, when 
customers are being served at a bargain counter in a crowded store — 
the ideal queue discipline (service order) of first-come first-served is 
not achieved. Instead, the service order tends to be at least somewhat 
random, and the probability of long delays is thereby increased over 
what it would be for the strict first-come first-served discipline. Un- 
fortunately for the analyst, when the queue discipline is somewhere 
between order-of-arrival and random — as is often the case in practice — 
the problem of calculating the delay distribution seems to be intractable. 
If the service order is assumed to be actually random, however, then 
this problem can sometimes be solved, and the delay distribution thus 
found is useful as a kind of bound on the distributions to be expected 
in those cases where the queue discipline deviates from order-of-arrival 
service toward randomness. 

The term “bound” as used here does not mean a bounding function 
in the strict sense. Actually, the delay distributions for models which 
differ only in queue discipline will cross each other, and hence no individ- 
ual distribution can be a true bound for a family of such distributions. 
However, the longer delays are generally of more interest in waiting- 
line problems than are the shorter ones, and it is true that, other things 
being equal, the probability of sufficiently long delays is greater for 
random service than for order-of-arrival service. 

1021 
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Although the assumption of random service does not provide this type 
of bound in all situations, as would the assumption that the service order 
is last-come first-served, it is clearly more realistic than the latter 
assumption in many cases and will provide a closer bound, or approxima- 
tion, to the actual delay distribution in these cases. 

In addition to its usefulness as a boundary case, a queueing model 
which involves random service and constant holding times is of direct 
interest in certain telephone switching applications. For example, it is 
approximated, under some circumstances, at the marker connectors of 
the No. 5 Crossbar system, and it may have further application in 
electronic switching systems. 

Of all possible holding-time distributions, the exponential and constant 
distributions have been studied most intensively in connection with 
queueing systems. The delay distribution was first obtained for a queue- 
ing system with constant holding times at least as early as 1909 by 
Krlang (Ref. 1, pp. 133-137). A solution to the delay problem for ex- 
ponential holding times was published in 1917 by the same author (Ref. 
1, p. 1388-155). In both of these cases the service was order-of-arrival 
(queued) rather than random. The first attempt to obtain a delay dis- 
tribution when the service order was random was published in 1942 by 
Mellor,? but this was not completely correct. A correct formulation of 
a random-service problem was obtained in 1946 by Vaulot,’ the solution 
to which was given by Pollaczek.4 The same problem also was solved 
by Palm.® A method for computing the delay distribution was published 
by Riordan in 1953.° The random-service problem solved by Vaulot 
and Pollaezek involved an exponential holding-time distribution. The 
present study is apparently the first attempt to combine the queue 
discipline of random service with holding times which are constant. 

The model considered here is characterized by the following: 

i. Random input — the probability that a call will arrive during any 
infinitesimal interval of length dt is proportional to dt within infinitesi- 
mals of higher order, and is independent of the state of the system, 
arrival times of previous calls or any other conditions whatever. It is 
equivalent to say that the call arrivals constitute a Poisson process. 

ii. Constant holding times — the service time of each call is the same 
constant, taken here to be unity. 

ili. Random service — if there are n calls waiting for service at the 
instant of a completion of service, the probability that any particular 
one of the calls will be served next is 1/n. The server is never idle when 


there are calls waiting to be served. 


iv. No defections — all calls wait in the system until they are served. 
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v. Statistical equilibrium — the distribution of the number of calls 
in the system is stationary, i.e., independent of time. Under the above 
assumptions, this will be the case when the arrival rate is less than one 
call per service interval and the system is given enough time to “settle 
down.” Mathematically, the condition is assured by the assumption 
that the initial distribution of the number of calls in the system is the 
equilibrium distribution. 

The over-all delay distribution is obtained below by decomposing it 
into a weighted sum of conditional delay distributions, depending on the 
state of the system at the epoch (instant) of the first departure (comple- 
tion of service) following the arrival of the call. It suffices to define the 
state of the system at the departure epochs as the number of calls re- 
maining in the system. (The call just completing service is not counted.) 
Each delay consists of two parts. The first part of the delay is the time 
from the arrival of the call in question until the first departure epoch 
following the arrival. The second part is the time from this departure 
epoch until the call in question gains service. The first part has a con- 
tinuous distribution over the interval [0 — 1]; the second part is dis- 
tributed over the nonnegative integers. 

Thus, in Fig. 1 the call that arrives at time a2 suffers a delay d; — az, 
which may vary from zero to a full holding time, until the first 
departure after its arrival. At time d; the call that arrived at a, will 
surely gain service, since it is the only call in the system at that time, 
and hence the integral part of the delay for this call will be zero units of 
time with probability one. In contrast, the call which arrives at a; will 
have to compete for service at d, with another call, and hence the integral 
part of its delay will have a probability of one-half to be zero and an 
equal probability to be greater than zero. In general, the integral part 
of the delay will have a (discrete) probability distribution that depends 
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Fig. 1 — Number of calls in the system as a function of time 
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on the number of calls in the system at the first opportunity that the 
call in question has to gain service. The determination of this probability 
distribution is the major portion of the task of evaluating the over-all 
delay distribution. 

It might be pointed out that the equilibrium state probabilities are 
the same whether they are considered over the whole process or only 
over the set of discrete instants of time consisting of the departure 
epochs. This is shown by the fact that the generating function for the 
state probabilities given by Kendall,’ which refers to the departure 
epochs, is the same, when specialized to constant holding times, as that 
of Crommelin® specialized to one server; and that the latter refers to 
the entire process. 

What is needed now is the probability that a call, conditional on its 
being delayed, will be one of n calls remaining in the system at the first 
departure epoch after its arrival. It turns out that the latter probability 
is the same as the unconditional probability of n — 1 calls in the system, 
as shown by the following argument. 

An arriving delayed call will be one of n calls in the system just after 
the next departure following its arrival in one of n + 1 mutually exclu- 
sive ways: there were k calls in the system at the last previous departure 
epoch before the arrival of the call in question and n — k other calls 
arrived during the service interval, k = 1,---, n; or there were zero 
calls in the system at the last previous departure epoch and n — 1 other 
calls (besides the call being served) arrived during the service interval. 
If Pr{n | A} represents the desired probability and P,(A) represents the 
unconditional probability of k calls in the system when d is the arrival 
rate, 


Prin |r} = [Po(A) + P,(A)] p(n aa l, d) + rie + P,,(A) p(0,d), (1) 


where p(k,A), is an individual Poisson term, i.e., the probability of k 
arrivals during a service-time interval. However, (1) is exactly Crom- 
melin’s equation for P,,_; in the one-server case. Therefore 


Pr{n | A} = Pr-ald). (2) 


With the dependence on suppressed hereafter, let the conditional 
probability that the delay is not greater than /, given that the delayed 
call is one of n calls waiting for service at the first post-arrival departure 
epoch, be denoted G(t|n). Let the resultant delay distribution for 
delayed calls be denoted F(t). Then 


F(t) = > Py-1 G(t | n). (3) 
n=l 


It remains to evaluate G(¢ | n). 
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Owing to the queue discipline of random service, the delay suffered 
by a call between the instant of its arrival and the first post-arrival 
departure epoch is independent of the delay subsequent to this epoch. 
Also, it is known that the initial delay of a fractional part of a holding 
time is uniformly distributed over a service interval, owing to the prop- 
erty of random arrivals. Furthermore, conditioning on the number in 
the system at the first post-arrival departure epoch does not affect the 
independence property or the uniform distribution of the fractional 
delay (as would, for example, conditioning on the number in the system 
at the instant of arrival). Let the delay be represented by 7’, the integral 
part of T by 7” and the fractional part by 7”. Similarly, let the quantity 
t be decomposed into ¢’ and t”. Then 

G(t|n) = Pr{T s t|n} ‘ 
Pr{T’ <U|n} + PriT’ =? |n} Pr{T’ s 2”} ) 


because of the independence of the integral and fractional parts of the 
delay. Or 


t'-1 


G(t|\n) = D> Pr {T’ =i|n} +t” Pr{7’ = |n} 
i=0 


because of the uniform distribution of 7”. 
It is not difficult to write a formula for Pr{7’ = 7 | n}. First, 


Pr{T’ = 01n} =! 
n 


since, at the first post-arrival departure epoch, the delayed call is one 
of n calls equally likely to be served. Also 
Pr{T’ = 1] n} = (1 _ ‘) >» p(ji) a= 
. n]} j=0 n—-1+),’ 

where p(j:) represents the Poisson probability of j; arrivals in a service 
interval, since, if the delayed call is not served at the first opportunity, 
any number of calls from zero upward may arrive during the next com- 
plete service interval. Extending this reasoning, one has for 7 > 0, 

1 T : 
Pr{T’ = i|n} =( _ ‘) be I P( je) 

ND ites s+ipek—nt+l k=l 


‘ee p(js) (6) 
= k ‘ . 
n—k+ dije[n—it Lj, 

1 1 





Although (6) can be written down directly, it is more convenient 
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computationally to use a recursive formula for the probabilities. (This 
was pointed out to the author by W. 8S. Hayward, Jr.) Let 

PriT’ =i\n} = Q,(n). 


Then 


Qo(n) = 


i 


and 


Qi(n) = [1 — Q(n)] 2. P(NOAutn +7 — 1). (7) 


j=0 


It is clear that (6) is the solution of (7). The delay distribution, F'(), 
is obtained by substituting (6) into (5) and the latter into (3). The 
values of P,.; necessary for evaluating (3) are obtained recursively 


from (1) and (2), together with the relation Py = 1 — X. 

The results of the calculations are shown as falling distributions of 
delays for all calls. That is, A[1 — F(é)] is plotted rather than F(é), in 
keeping with custom in the field of delay theory. The distributions are 
shown in Fig. 2 for delays up to 14 holding times and, in Fig. 3 on a 
compressed scale, for delays up to 130 holding times. 

As an example of the use of the curves, suppose a single marker whose 
holding time is 0.1 second serves calls at random and that it is desired 
to limit the probability of a delay greater than 2 seconds at this marker 
to 0.0001. It is required to find the permissible occupancy. Since a delay 
of 20 holding times is involved, Fig. 3 should be consulted. On this 
chart, the occupancy for a probability equal to 0.0001 of delay t/h = 20 
is found to be just above 0.60, roughly 0.61. Thus, the marker can handle 
a random input averaging 6.1 calls per second. 

In some applications in which service is not precisely order-of-arrival, 
it may be presumed that the delay distribution will lie between those 
for random and queued service. In such cases, the delay distributions 
will fall in a band bounded by random service and queued-service 
(Crommelin) curves. Examples of such bands are shown on Fig. 4. It 
should be noted that the bounding curves for any occupancy must cross, 
since the average delay is independent of the queue discipline. 

It is of some interest also to compare the random-service delay distri- 
butions for constant and exponential holding times. It is conjectured 
that a pair of such curves for a given occupancy defines a band containing 
all random-service delay distributions, for that occupany, where the 
holding-time distribution is of gamma type in which the coefficient of 
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variation is not greater than unity. (In particular, the x? distributions 
with two or more degrees of freedom are of this type.) Several such pairs 
of curves are shown on Fig. 5. (The exponential-holding-time curves are 
based on Wilkinson.*) Here, of course, the curves do not cross — the 


exponential-holding-time curves always (except at ¢ = 0) lie above their 


constant-holding-time counterparts. 
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Evaluation of Solderless Wrapped 


Connections for Central 


Office Use 


By S. J. ELLIOTT 


(Manuscript received September 10, 1958) 


In the development of solderless wrapped connections for telephone central 
office applications, the general reliability objective has been that the connec- 
tions should remain mechanically secure and electrically stable during manu- 
facture, shipment and installation and for 40 years thereafter in actual 
service. Destructive mechanical tests have been used to evaluate the me- 
chanical properties of the connections. Combinations of elevated tempera- 
tures and mechanical disturbances have been used to accelerate the aging 
processes that tend to cause electrical instability. The results of such tests 
have provided considerable assurance that properly designed and properly 
made solderless wrapped connections will perform satisfactorily for 40 
years tn central office service. 


I. INTRODUCTION 


Since 1952, when the solderless wrapped connection first was described 
publicly,! its use in telephone central offices has grown steadily. Several 
hundred million solderless connections are being wrapped annually in 
the Bell System today, and the growth is continuing. 

The tangible and immediate results of solderless wrapping have been 
gratifying. For example, the use of solderless wrapping has reduced 
manufacturing costs by speeding up many wiring operations and by re- 
ducing troubles caused by wire clippings and solder splashes. Further- 
more, since solderless wrapping avoids the risk of damaging heat- 
sensitive insulation by soldering operations, it has made widespread 
use of plastic-insulated wire practicable, and this is leading to sub- 
stantial additional savings. 

In the end, however, these savings could be illusory if the use of solder- 
less wrapped connections degraded telephone service or increased the 
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maintenance effort required in the telephone plant. The laboratory 
evaluation of solderless wrapped connections has been continued, there- 
fore, in order to assess the risk of deterioration in service and to provide 
guidance for the design of connections that are most likely to be reliable. 
This work has revealed certain limitations of solderless wrapped connec- 
tions, but, at the same time, it has provided considerable assurance that 
properly designed and properly made connections will be reliable in cen- 
tral office service. 

Many persons have inquired about the methods used for evaluating 
the capabilities of solderless wrapped connections and about the results 
that have been obtained since publication of earlier articles.?-* Since the 
inquiries continue undiminished year after year, it appears that there is 
sufficient interest in solderless wrapping to warrant another paper on the 
subject. An attempt is made here, therefore, to bring the story on evalua- 
tion up to date. 


1.1 Deseription of Solderless Wrapped Connection 


A solderless wrapped connection is made by wrapping a wire tightly 
around a terminal, and the connection is held together thereafter by the 


elastic stresses left in the two members. A typical solderless wrapped 


connection is shown in Fig. 1. 


Fig. 1 — Typical solderless wrapped connection. 
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For Bell System applications, a minimum of five turns of wire is speci- 
fied when No. 22 gauge wire is used, and a minimum of six turns is 
specified when No. 24 gauge wire is used. The wire should be closely 
wound on the terminal, but turns of wire should not overlap. 

Only solid copper wire has been approved for solderless wrapping. 
The use of stranded wire presents a number of difficulties and has not 
been investigated in detail. 


1.2 Reliability Objectives 


The general reliability objective in the development of solderless 
wrapped connections has been that the connections should remain 
mechanically secure and electrically stable during manufacture, shipment 
and installation and for 40 years thereafter in telephone central office 
service. 

The mechanical security objective cannot be defined in absolute terms, 
for too little is known about the magnitudes and distributions of the 


disturbances to which wires are subjected in service. As a comparative 


objective, however, solderless wrapping should not increase the occur- 
rence of broken wires and loose connections beyond the levels that now 
prevail with soldered connections. 

The electrical stability objective can be stated more specifically. Not 
more than one connection in 10,000 should exhibit resistance fluctuations 
greater than 0.1 ohm in service. The electrical noise produced by re- 
sistance fluctuations of this magnitude is considered to be at the thresh- 
old of transmission impairment in the most critical transmission circuits 
now in service. 


If, ELECTRICAL MEASUREMENTS 


The method used for checking the electrical stability of solderless 
wrapped connections has been described fully by Van Horn. In essence, 
it consists of a resistance measurement made by the familiar voltmeter- 
ammeter method. The voltage drop across the connection is measured 
with a moving coil millivoltmeter while a direct current of 0.1 ampere 
flows through the connection. 

The measurement of interest is the variation of resistance when the 
connection is disturbed mechanically. The disturbance is created by 
plucking the terminal — that is, by slowly deflecting the free end of the 
terminal a predetermined distance and then releasing it abruptly, allow- 
ing terminal and connection to vibrate freely. The resistance variation 
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(AR) is the difference between the maximum and minimum resistance 
values observed during the disturbance. 

Although the moving-coil millivoltmeter is too sluggish to follow rapid 
fluctuations of resistance, the measurement is surprisingly sensitive. In a 
series of measurements made by an appropriate electrical noise meter 
while connections were subjected to vibration of the sort encountered in 
service, the measured noise levels consistently were lower than the levels 
‘alculated from the results of the simple AR measurements. It was con- 
cluded that the simple AR measurement would be adequate for routine 
development tests and, since it could be made far more rapidly than any 
of the more refined measurements that had been explored, it was adopted. 


2.1 Analysis of AR Measurements 


Before reviewing the accelerated aging tests that have been employed 
in the evaluation of solderless wrapped connections, it is necessary to 
describe the method that will be used to summarize the results. This 
method, a product of hindsight rather than foresight, has been used for 
only a short time in development of the solderless wrapped connection. 

A major problem in the evaluation, of course, has been to demonstrate 
by means of comparatively small samples that the probability of AR 
exceeding 0.1 ohm is no more than one in 10,000. The expense of testing 
large enough samples to determine the frequency distribution of AR in 
every case would have been prohibitive. Over a period of years, however, 


a moderately large number of tests were made on a few particular types 
of connections. The cumulative sample sizes in those cases, although still 


small compared with 10,000, were large enough to warrant attempts at 
curve fitting. 

As measured by the chi-square test, the expression that seems to fit 
the observed distributions best is obtained by first grouping the AR data 
in cells as follows: 

Number of 


Connections 


in Cell 


Cell Number, AR 
Lz (Ohms) 


0 SAR 
0.001 < AR 
0.01 <AR 
0.1 <AR 


No 
Mm 
Ne 
N3 


IA IA HA HA 


etc. 
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The arithmetic mean of the grouped distribution then is calculated as 
follows: 


— Ono + Im + 2nzg + 3n3 + --- 


mo +m + ne + m3 + tee 
_™ + 2m, + 3n3 + --- 


N ’ 
4 


where N is the total number of connections in the sample. 
Once the mean, m, has been calculated, the probability of finding a 
connection in cell « can be expressed as 


9) 


Bearing in mind that 0! = 1, the probabilities associated with the first 
few cells become 
Po = ” he 


Py mPo, 


? 


mP, 


9) ? 


mP, 


The reader may recognize that (2), the general expression for P, , is 
the same as the expression for the Poisson distribution. The physical 
interpretation, however, is different. In the Poisson distribution, P, is 
the probability that « defectives will occur in a random sample of size N 
if m is the expected average number of defectives in a sample of that 
size. As a description of the AR distribution, however, P, is the probabil- 
ity that AR, for a single connection chosen at random, will fall between 
the limits defined by the cell number x. For a random sample of N con- 
nections, then, the number of connections expected to fall in cell number 
x willbe NP,. 

In general, the agreement between (2) and observed AR distributions 
has been best for the more stable types of connections — types in which 
high values of AR rarely occur and for which m is small. These, of course, 


are the types of connections that are desirable. For less stable types of 
connections — the types that would be rejected as unreliable — the 
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m=0.040 m=0.086 


4 





















































So 1 
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Fig. 2 Ixamples of the AR distribution defined by (2). Shading indicates 
cells for which AR exceeds 0.1 ohm. 


agreement has been poorer. The dividing line between good agreement 
and poor agreement appears to be somewhere in the vicinity of m = 0.3. 

The mean, m, is a convenient statistic to use in summarizing the re- 
sults of a group of AR measurements, and it will be used for that purpose 
in the discussion of accelerated aging tests. Qualitatively, low values of 
m indicate stable connections and high values indicate unstable connec- 
tions. Quantitatively, in those cases where m is less than about 0.3, m 
defines the observed AR distribution very effectively. 

The distribution corresponding to m = 0.086 (shown in Fig. 2) is of 
particular interest. It represents the case for which there is one chance in 
10,000 that AR will exceed 0.1 ohm. Consequently, a value of m = 0.086 
in service is the maximum value that will satisfy the stability objective 
for central office use of solderless wrapped connections in the Bell Sys- 
tem, 

One more point needs to be made before proceeding to the discussion 
of accelerated aging tests. The variance of the distribution described by 
(2) is equal to the mean, m. It follows, then, that the standard deviation 
is equal to the square root of m. This relationship will be used later in 
developing criteria for deciding whether or not connections are stable 
enough to be approved. 


Il, ACCELERATED AGING TESTS 


Iairly early in the development of solderless wrapped connections, it 
was concluded that electrical instability, if it occurred at all, would result 
principally from relaxation of stresses in the wire and terminal — the 
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stresses that hold the connection together. Since that time, all aging 
tests employed in the evaluation of solderless wrapped connections have 
included elevated temperatures to accelerate the relaxation process. 
The work of Mason? and others indicated that the stress in copper 
wire (measured with respect to its value one day after a connection was 
wrapped) would relax about 40 per cent in 40 years at room temperature. 
To allow for some error in the room temperature prediction, and to allow 


for the fact that temperatures in enclosed equipments may approach 


55°C in some cases, it has been assumed that the stresses in solderless 
connections wrapped with copper wire may relax as much as 50 per cent 
under central office conditions. That has been the degree of stress relaxa- 
tion aimed at in the various accelerated aging tests. 

One of the early tests consisted simply of baking the connections for 
three hours at 175°C. (Mason had shown that the stresses in connections 
wrapped with copper wire would relax 50 per cent in about 23 hours at 
175°C, and the extra half-hour was needed to bring the specimens from 
room temperature up to the oven temperature.) At the end of the heat 
treatment, the connections were cooled to room temperature and then 
were checked for electrical instability. The general experience was that 
any solderless wrapped connection which was stable before the heat 
treatment still would be stable after the heat treatment. 

Although such results were encouraging, they were, at the same time, 
disconcerting. For example, some of the connections that behaved so 
well in the 175°C test were wrapped on terminals that scarcely could be 
considered good mechanical structures for supporting stresses over long 
periods. The twin-wire terminal of the wire-spring relay, in particular, 
fell in the questionable category. At that time it consisted simply of two 
parallel nickel silver wires which were bonded together by being dipped 
in soft solder and then serrated. The parallel wires by themselves did not 
have sufficient torsional stiffness to support the stresses that are required 
to hold a solderless wrapped connection together; and soft solder, be- 
cause of its cold flow properties, is a notoriously poor material for sup- 
porting stresses. It was doubtful that the soft solder, at room tempera- 
ture, could maintain the stresses long enough at levels high enough for 
solid state diffusion to occur. Since 175°C was not far below the softening 
temperature of the solder, there was at least a suspicion that something 
akin to a soldered joint had been produced in the accelerated aging test. 

There were questions, also, about the metallurgical behavior of the 
copper wire at the elevated temperature. It was known, for example, that 
recrystallization is more likely to occur in copper at 175°C than at lower 
temperatures. 

In short, it was suspected that a temperature as high as 175°C might 
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do more than accelerate the aging process: it might alter the physical 
nature of the aging process itself. If this were so, then the 175°C test 
might give a false picture of the aging that would occur in actual service 


3.1 105°C Aging Test 


It was decided, therefore, that some exploratory aging tests should be 
run at a substantially lower temperature. For several reasons, a tempera- 
ture of 105°C was chosen. It was well below the softening temperature 
of tin-lead solders; it was low enough so that there should be little likeli- 
hood of recrystallization occurring in the copper wire; yet it was high 
enough to produce the desired stress relaxation in a few months. Mason’s 
work had indicated that the stresses in the wrapped connections would 
relax 50 per cent in about 150 days at 105°C, so the test period was set 
at 150 days, or approximately five months. 

As compared with the desired service life of 40 years, the three-hour 
175°C test represented a 100,000:1 acceleration of the stress relaxation 
process. Since a five-month 105°C test would represent an acceleration of 
only 100:1, it was expected to be a far more realistic aging test. 

Five months is a long time, however, to wait for test results. It was 
decided, therefore, that the test connections should be measured peri- 
odically during the aging test, so that any instability would be detected 
as soon as it occurred. It is important to note that this decision intro- 
duced two more changes in the accelerated aging test: (a) it subjected 
the connections to temperature cycling, for they were removed from the 
oven and allowed to cool whenever they were measured, and (b) it sub- 
jected the connections to mechanical disturbances while they were being 
aged, for the terminals were plucked whenever resistance variations were 
measured. 


The first connections subjected to the 105°C aging test were wrapped 


on the solder-dipped twin-wire terminals described earlier. Connections 
of that type had survived the 175°C test without showing any evidence 
of instability. In the 105°C test, however, they soon began to exhibit 
measurable resistance fluctuations, and they grew more and more un- 
stable as the test continued. The history of that first group of connec- 
tions is plotted in terms of the statistic m in Fig. 3. It was evident that 
the 105°C aging test with periodic cycling and measurement was more 
severe than the undisturbed 175°C test. 

Subsequently an experiment was performed to compare the relative 
effects of temperature, temperature cycling and mechanical disturbance. 
One group of connections was aged at 105°C, cooled to room tempera- 
ture once each week and measured (plucked) while at room temperature 
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Fig. 3 — Effect of 105°C accelerated aging test on solderless connections wrap- 
ped with No. 24 gauge copper wire on solder-dipped twin wire terminals of wire- 
spring relay. Sample size 48. 


on alternate weeks. At the end of 150 days, the value of m for that group 
was 2.1. 
A second group of similar connections was aged at 105°C and cooled 


to room temperature once each week, but was not disturbed by measure- 
ments. After 150 days, m was 0.2. 

A third group was aged continuously at 105°C without either tem- 
perature cycling or mechanical disturbance. After 150 days, m was 0.04. 


A fourth group was maintained continuously at room temperature but 
measured every two weeks. After 150 days, m was 0.02. 

Three important inferences were drawn from the results of this experi- 
ment: 

i. Although solid state diffusion may occur during undisturbed aging 
of solderless wrapped connections, thus tending to offset the detrimental 
effects of stress relaxation, repeated mechanical disturbances during the 
aging process can impede diffusion and can lead to unstable connections. 

ii. Since connections may be disturbed from time to time in service, 
accelerated aging tests should include periodic mechanical disturbances. 

iii. Temperature cycling alone can produce a form of mechanical dis- 
turbance if the temperature coefficients of expansion of wire and terminal 
are not equal. 

Another related experiment was performed to study the effects of the 
frequency with which connections were disturbed during the 105°C aging 
test. The more frequently the connections were disturbed, the sooner 
they became unstable. A few of the test results are shown in Fig. 4 to 
illustrate the pattern. 

Long before these two experiments were completed, it had been neces- 
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Fig. 4 kiffeets of varying the frequency of disturbance in 105°C accelerated 
aging test. Connections wrapped with No. 24 gauge tinned copper wire on 0.010 X 
0.062-inch flat nickel silver terminals. Sample size 48 in each case. 


sary to standardize an accelerated aging test so that the development 
of suitable terminals could proceed without further delay. The 150-day 
105°C test was adopted, primarily because it was the only aging test 
up to that time that had revealed highly significant differences among 
various types of connections — differences that usually were consistent 
with qualitative analyses of the various mechanical structures. The con- 
nections were cooled to room temperature once each week, and their 
resistance variations were measured at room temperature every other 
week. Each connection was plucked two or three times during the AR 
measurement. Eventually, a plucking amplitude of 3; inch was stand- 
ardized. 

A standard procedure for the preparation of test specimens also 
evolved gradually. It has become the usual practice now, in preparing 


each group of test connections, to use two wrapping bits (one that wraps 
more tightly than the average bit, and one that wraps less tightly), two 
wrapping tools (one power-driven and the other manually operated), 
two grades of wire (one comparatively hard and the other comparatively 
soft) and two operators. All 16 combinations of the four parameters 
now are included at least twice in each test. 
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3.2 Results of 105°C Accelerated Aging Test 


Altogether, more than 11,000 solderless wrapped connections have 
been subjected to the 105°C accelerated aging test. Most of the tests 
have been aimed at the problem of finding terminal configurations which 
would yield reliable connections and which, at the same time, could be 
manufactured economically. The results of a number of those tests are 
summarized in Table I, along with the results of a screening test that is 
described in Section 3.4. 

Fig. 5 shows the results that were obtained with No. 24 gauge tinned 
copper wire wrapped on terminals punched from several widely used 
thicknesses of nickel silver stock. The performance of the thinner ter- 
minals was considered to be unsatisfactory. Various types of longitudi- 
nal embossing were investigated to find means for improving the thin 
terminals, and the form shown in Fig. 6 finally was standardized. Al- 
though this form is not necessarily optimum, it is a convenient form to 
manufacture; and it has behaved well in the accelerated aging test, as 
shown in Table I. 


TasBLE I — Resutts oF ACCELERATED AGING TESTS ON SOLDERLESS 
CONNECTIONS WRAPPED WITH TINNED CopPpER WIRE 
175°C Screening Test 


105°C Aging Test 


No. 22 
Wire 


No. 24 
Wire 


No. 22 No. 24 


Type of Terminal Material Finish Wire Wire 


Rectangular 


0.045 X 0.045 
0.0319 K 0.062 
0.0319 XK 0.062 


296; O 168; O 328.0 
152.0.020 160) 0 160 
104; 0 9760.15 20500 


Electrotin 136; 0 
Electrotin|152 0.026 
None 152) O 


Brass 
Brass 
Nickel Silver 


0.0253 X 0 
0.0201 K 0 
0.0159 XK 0 
0.0126 X 0 
0.010 x0 


Smbossed 
0.0253 X 0 
0.0201 K O 
0.0159 XK O 
0.0126 XK 0 
0.010 XO 


Wire-Spring 


Nicke 
Nicke 
Nicke 
Nicke 
Nickel 


062 
062 
.062 
062 
.062 


.062 | Nickel 
.062 | Nickel 
.062 Nickel 
.062 Nicke 
.062 Nickel 


Relay 


Single Wire 


Twin Wire 


Nickel 
Nickel 


Silver 
Silver 
Silver 
Silver 
Silver 


Silver 
Silver 
Silver 
Silver 
Silver 


Silver 
Silver 


None 
None 
None 
None 
None 


None 
None 
None 
None 
None 


None 
None 


240.053 105 0.37 
144.0.084 1160.88 
24)1.2 
2411.8 
96)2.1 


1360.31 


1440.12 288 0 
144.0.021 513'0.006 


93.0 
3420 
1210.: 
203.0 


1164.0 


13710 
2740 
1310 
1310 
180.0 
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Fig. 5 Results of 105°C accelerated aging tests. All terminals flat nickel sil- 
ver, 0.062 inch wide. Connections wrapped with No. 24 gauge tinned copper wire 


The early single-wire terminal of the wire-spring relay was formed 
from a round wire by flattening, serrating on one side and solder-dip- 


ping. Its performance was unsatisfactory, but it was improved simply 
by omitting the solder. Its present form is shown in Fig. 7. 

The twin-wire terminal of the wire-spring relay was more of a prob- 
lem. Many designs were conceived and investigated, but most of the 





> e-0.012 MAX, NOMINAL 
' IMENSI 
THICKNESS “ A ” 


OF STOCK 
(INCH) (INCH) 


[——— 


Yaa 0.010 0.021 + 0.002 
0.013 0.023+ 0.002 


lq 0.007 MAX. 0.016 0.025+ 0.002 


0.002 MIN. 
‘ : +0. 
0.060 t 0.005 0.020 0.026 +t 0.002 





Fig. 6 — Embossing approved for Bell System terminals. 
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ee 


Fig. 7 — Single-wire terminal of wire-spring relay. 
designs that showed promise from the solderless wrapping standpoint 
were objectionable from the manufacturing standpoint. In the end, a 
compromise design was adopted. The twin wires are twisted tightly 
together, then they are coined in a closed die which has a trapezoidal 
cross section. The resulting terminal is shown in Fig. 8. 

Although most of the 105°C tests have been made with No. 24 gauge 
wire, a number of tests have been made also with No. 22 gauge wire. As 
shown in Table I, the results indicate that No. 22 gauge connections on 
the heavier terminals are as stable as No. 24 gauge connections, but that 
No. 22 gauge connections on the lighter terminals are inferior. 

The aging tests of No. 26 gauge connections are not completed yet. 
The preliminary results indicate, however, that No. 26 gauge connec- 
tions, even when wrapped with as many as nine turns of wire, are less 
stable than six-turn No. 24 gauge connections on the same types of ter- 
minals. 

For certain types of wiring in the Bell System, it has been standard 
practice to use untinned copper wire. Solderless connections wrapped 
with untinned wire, however, tend to deteriorate sooner than connec- 
tions wrapped with tinned wire. Fig. 9 illustrates results obtained with 
untinned wire. 

The data that have been presented so far should be sufficient to pro- 
vide a bird’s-eye view of the results that have been obtained in the 


—", 
ee ee le al Me a a ae 


Fig. 8 — Twin-wire terminal of wire-spring relay. 
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Fig. 9 Effect of 105°C accelerated aging test on connections wrapped with 
untinned copper wire on electrotinned brass terminals. Sample size 32 in each 
case 


105°C aging test. Other alleys and byways have been explored, but the 
picture would not be sharpened perceptibly by presenting more details. 
It is more pertinent at this point to consider what use can be made of 
the test results. 


3.3 Criterion for Acceptance (105°C Aging Test) 


The purpose of the accelerated aging test, of course, is to distinguish 
between those types of solderless wrapped connections which are likely 
to satisfy the 40-year stability objective in service and those types which 
are not likely to satisfy the objective. The test results supply a reason- 
ably clear picture of the relative instability of the several types of con- 
nections, but this by itself is not enough. Somewhere on the instability 
scale the development engineer eventually must draw a line and say, at 
least to himself, ‘I will approve the use of types of connections that fall 
below this line; I will not approve types that fall above it.” In other 
words, he must establish a criterion for acceptance. 

The criterion for acceptance has not remained static as the develop- 
ment of solderless wrapped connections progressed. Instead, it has been 
revised several times as the aging test evolved and as the information 
obtained from aging tests expanded. Its present form has considerably 
more meaning and is more convenient to use than some of the earlier 
forms. 
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The criterion for acceptance is based upon two premises: (a) that the 
105°C accelerated aging test does, in fact, produce at least as much 
instability as 40 years of central office service will produce and (b) 
that (2) is an adequate description of the AR distribution. Although 
final confirmation will not be available until about 1990, evidence that 
these premises are valid is increasing year after year. 

It was stated previously that the value of m should not exceed 0.086 
in service if the resistance variation of not more than one connection in 
10,000 is to exceed 0.1 ohm. If the 105°C test is an adequate simulation 
of service conditions, then those types of connections which have values 
of m below 0.086 in the 105°C test should be acceptable. This would be 
the criterion for acceptance if very large samples were tested. Where 
small samples are tested, however, it is prudent to make allowance for 
sampling errors. 

As stated earlier, the variance of the distribution described by (2) is 
equal to m. Consequently, the standard deviation of the distribution is 
equal to the square root of m. If a large number of samples of size N 
are drawn from a population whose true mean is m’, then roughly five 
per cent of the sample means can be expected to fall below the value 


/ x , 
/, —T f/m 
Moo, = m — 1.645 / v° (4) 


A reasonable acceptance level, then, is 


/ 
~ . £V.086 
moo = 0.086 — cose. | ; 


<)> 


In other words, the stability of the connections will be considered ac- 


0.086 — 


ceptable if the mean, m, of a sample of N connections is less than the 
value of mo.o5 given by (5). The acceptance level is plotted as a function 
of N in Fig. 10. 

As an aid to decision making, it also is possible to set up a criterion 
for rejection. If a large number of samples of size N are drawn from a 
population whose true mean is m’, then roughly 95 per cent of the sam- 
ple means can be expected to fall below the value 
‘m’ 

N° 
A reasonable rejection level for the case m’ = 0.086, then, is 
0.482 


V/N 


, . — 
moo = m + 1.645, 


mo.% = 0.086 + 
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Fig. 10 — Criteria for acceptance and rejection in 105°C accelerated aging 
test. Points are results of 105°C tests on approved types of solderless wrapped 
connections. 


In other words, the stability of the connections will be considered un- 
acceptable if the mean, m, of a sample of N connections exceeds the 
value of mo 95 given by (7). Further testing of that particular type of 
connection is not very likely to be profitable, so it might as well be re- 
jected. 

Establishing a rejection limit that does not coincide with the accept- 
ance limit provides a zone of uncertainty in which the connections are 


neither clearly acceptable nor clearly unacceptable. It recognizes the 
possibility that further testing of a marginal type of connection might 


demonstrate that the type is acceptable. If the value of being able to 
approve that type of connection outweighs the cost of further testing, 
then it may be worthwhile to continue. 

On the average, the acceptance criterion of (5) is conservative. Not 
only does it provide margin for moderate sampling errors, it also pro- 
vides some margin, on the average, for an error in the basic premise 
that the 105°C test adequately simulates aging in service. 

At the same time, the form of (5) is helpful to the experimenter, for 
it tells him the minimum sample size that can be used as the basis for 
acceptance. By setting moo, equal to zero in (5), the corresponding 
minimum sample size is found to be 32. If he tests 32 connections and 
m turns out to be zero, he is entitled to approve the connections without 
further testing. With a smaller sample, he would not be entitled to 
approve them even though m turned out to be zero. 

From a practical standpoint, it is prudent to test a larger sample 
when approval is needed in the shortest possible time. If AR for even a 
single connection exceeds 0.001 ohm in a sample of 32, then the con- 
nections cannot be approved, the test has to be expanded, and the final 
decision is delayed. A practical compromise is to test a sample large 
enough so that four connections could exceed 0.001 ohm slightly without 
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causing m to exceed moo5. If « = 1 for the four connections, then 
m = 4/N and, from (5), the corresponding sample size is 104. 


3.4 175°C Screening Test 


Although the 105°C test appears to be quite effective in distinguish- 
ing between stable connections and unstable connections, its slowness is 
a practical disadvantage. In cases where a number of alternate terminal 
designs are being considered, for example, and where it is desirable to 
concentrate development effort on a few of the most promising alter- 
nates, the five-month waiting period can be extremely inconvenient. 
There is need, therefore, for a quick screening test that will serve to 
identify those types of connections that are likely to pass the 105°C 
aging test. 

Early experience with the 105°C test suggested that addition of 
mechanical disturbances to the original three-hour undisturbed 175°C 
test might make it capable of detecting potentially unstable connections. 
Within limits, this proved to be so. 

In the screening test that eventually was standardized, the solderless 
wrapped connections are baked in an oven for three hours at 175°C. 
The three-hour period includes the warming up period of the oven and 
fixtures, which amounts to about one-half hour with the equipment that 
has been used. During the entire three-hour period, each terminal is 
plucked automatically once per minute. The plucking mechanism deflects 
the free end of each terminal ;'g inch, then releases it abruptly, allowing 
the terminal and lead wire to vibrate freely. The opposite end of the 
terminal is supported rigidly, of course, and the unsupported length (on 
which the connection is wrapped) is }$ inch. 

At the end of the 175°C heat treatment, the connection is cooled to 
room temperature, and its resistance variation is measured as described 
previously except that (a) the terminal is plucked only once during the 
measurement instead of two or three times, and (b) it is plucked 7 
inch instead of #5 inch. 

In the ordinary cases, where there is no soft solder in the connection, 
the correlation between the results of this test and the results of the 
105°C test has been reasonably good. The principal discrepancy is that 
the 175°C test is not as sensitive as the 105°C test; that is, the spread 
between stable and unstable connections tends to be smaller in the 175°C 
test than in the 105°C test. This can be seen in Table I. 

Despite its limitations, the 175°C screening test has been extremely 


6 


useful in conserving testing effort and in speeding up various phases of 
the development program. Altogether, more than 16,000 solderless 
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wrapped connections have been subjected to the 175°C test. On a few 
occasions, final approval of particular types of connections has been 
based upon results of the 175°C test, although the usual practice is to 
withhold final approval until the 105°C test is completed. 


3.5 Criterion for Acceptance (175°C Screening Test) 


Empirically, it is possible to define acceptance and rejection limits 
for the 175°C test that will correspond roughly to the limits for the 105°C 
test. Although such limits actually have not been used in the past, it 
appears that limits based upon m’ = 0.22 for the 175°C test would have 
led to essentially the same decisions that were made on the basis of 
105°C test results. For m’ = 0.22, the limits for the 175°C test would 
be 


0.77 


99 — 


).77 
moo = 0.22 V/N 


Moo = 0.22 i (9) 
0.95 +h hed /N . e 

These limits are shown in Fig. 11, along with observed values for various 
types of connections that have been subjected to both the 175°C and 
the 105°C tests. The symbols used for the values observed in the 175°C 
test indicate how similar types of connections fared in the 105°C test. 
As indicated previously, the limits for the 175°C test can serve as 
guides for making decisions. If m for a particular type of connection is 


0.8 
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Fig. 11 Criteria for acceptance and rejection in 175°C screening test. Points 
are results of 175°C tests, but symbols indicate how similar connections behaved 
in 105°C tests. 
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less than mo.o5, it probably is worthwhile to make tests at 105°C. If 
m is greater than mo.95 , it probably is not worthwhile. If m falls between 
Mo.o5 and mos, the decision will have to be based upon additional 
considerations. 

The sample size for which mo.o5 = 0 is about 13. The sample size for 
which mo.o5 = 4/N is about 41. A sample of 13, then, is the smallest 
sample that should be tested, and 41 is a more realistic minimum. 


IV. MECHANICAL TESTS 


Certain standard mechanical tests are made regularly to qualify 


wrapping bits for use in production, and the same tests have been made 
to qualify the bits used in the evaluation studies. Sample connections 
are wrapped on specified types of terminals, then two different types of 
tests are made on the sample connections. In the first test, the force 
required to strip the connection off of the terminal is measured. It is 
required that this stripping force be at least 3000 grams and that the 
median for any subgroup of five connections be at least 4200 grams. The 
purpose of this test is to provide assurance that the bit is capable of 
wrapping tight connections 

In the second test, the wire is unwrapped from the terminal, the un- 
wrapping force being applied to the wire at least one-half inch from the 
terminal without the wire being restrained from twisting or bending 
back upon itself. It is required that the connection be capable of being 
unwrapped in this fashion without the wire breaking. The purpose of this 
test is to provide assurance that the bit does not wrap too tightly 
so tightly that the wire would be weakened excessively. 

Assuming that the connections have been wrapped with qualified 
bits, it is of interest to consider what might happen to them subsequently 
in service. In general, this reduces to a consideration of mechanical 
treatments that would tend to loosen the connection and break the wire. 

The principal types of mechanical treatment that could loosen a 
wrapped connection are (a) squeezing the sides of the connection, (b) 
pushing or pulling on the body of the connection and, of course, (c) 
unwrapping the connection. 

Although the squeezing forces that would be required to loosen con- 
nections have not been measured, it is evident that enough force could 
be exerted with a pair of pliers to damage a connection. Wiremen and 
maintenance men have been cautioned, therefore, not to squeeze the 
connections — either with pliers or with test clips. 

The force required to slide the connection along the terminal ordi- 
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Fig. 12 — Effect of terminal thickness on stripping force. Connections wrapped 
with No. 24 gauge tinned copper wire on flat nickel silver terminals 0.062 inch 
wide. Sample size 100. 


narily is well above the minimum requirement of 3000 grams, ranging 
up to more than 10,000 grams in some cases. In general, the heavier 
terminals tend to give higher stripping forces than do the lighter termi- 
nals, and the heavier gauges of wire tend to give higher values than do 
the lighter gauges. Figs. 12 and 13 illustrate the relationships. 





@6 TURNS 


@9 TURNS 
@68 


e7 


MEAN STRIPPING FORCE IN KILOGRAMS 


@6 








1 
22 24 


WIRE GAUGE NUMBER 





Fig. 13 — Effect of wire size on stripping force. Connections wrapped with 
tinned copper wire on 0.0319 X 0.062-inch nickel silver terminals. Sample size 50 
to 175. 
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Ordinarily there is little risk that a connection will be unwrapped in 
normal wiring or maintenance operations — unless, of course, the wire 
is unwrapped deliberately to remove the connection. It is possible, 
nevertheless, to unwind a connection by pulling steadily on the lead 
wire in the direction parallel to, and in line with, the terminal axis. On 
the average, a force of about 825 grams is sufficient to unwrap one-half 
turn of a No. 24 gauge connection on a 0.0319 X 0.062-inch terminal, 
and a force of about 2300 grams will unwrap one full turn. The standard 
deviations are about 15 per cent of these values. 

The principal types of mechanical treatment that are liable to break 
wires in service are (a) tension alone, (b) repeated bending alone and 
(ec) repeated bending combined with tension. A number of tests have 
been made to compare the effects of these treatments on wires connected 
to terminals by solderless wrapping with the effects on wires connected 
by soldering. 

The results of a few tensile tests are summarized in Table II. When 
the wire was pulled radially (perpendicular to the terminal axis), almost 
the full breaking strength of the wire was realized with the soldered 
connections. The breaking strength with the solderless wrapped con- 
nections was about eight per cent lower, however, because the wire was 
indented where it had been wrapped around the corners of the rectan- 
gular terminal. 

It is interesting to note that, even with soldered connections, the full 


breaking strength of the wire was not realized when the wire was pulled 


axially. The soldered joint was broken in stages, and, in most cases, 
the wire was completely unwrapped from the terminal before the force 
reached the breaking strength of the wire. With the flat terminals, the 
ultimate strength of the solderless wrapped connections was significantly 
lower than that of the soldered connections. With the wire-spring relay 
terminals, on the other hand, the differences between the solderless 
wrapped and soldered connections were trivial. The serrations of the 


TaBLe I] — Mr&an ULTIMATE STRENGTH (IN GRAMS) OF CONNECTIONS 
Mabe witu No. 24 GauGre Copper WIRE 
Solderless Wrapped Wrapped and Sol 


Connections dered Connections 
6 Turns 14% to 2 Turns 


Direction of Pull 


Type of Terminal on Wire 


0.0319 K 0.062 Axial 2330 
0.0319 K 0.062 Radial 4814 
Single Wire Axial 4450 
Twin Wire Axial 4527 
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Fig. 14 Iffects of repeated bending without tension. Connections made with 
No. 24 gauge tinned copper wire on single-wire terminals of wire-spring relay. 
Sample size 16 


single-wire terminal and the irregularities of the twisted and coined 
twin-wire terminal were as effective as soldering in locking the wrapped 
wire in place. 

The effects of repeated bending without tension are illustrated in Fig. 
14. The test was performed by bending the wire back and forth through 
the angles indicated until the wire broke. The bending moments were 
large enough to exceed the elastic limit of the copper wire. In Fig. 14 
conventional +3¢ control limits (or confidence limits) are shown with 
each plotted point as a simple, graphic way of indicating that the per- 
formance of the solderless wrapped connections was significantly better, 
on the average, than that of the soldered connections. 

Vibration tests, of course, provide another method for measuring the 
effects of repeated bending without tension. Fig. 15 shows the results 
of a vibration test performed by the Western Electric Company on 
small equipment units wired with local cables. The +30 control limits 
in this case are based upon the observed breakage (‘‘fraction defective’’) 
of the soldered wires. The fact that the observed breakage of the solder- 
less wrapped wires consistently fell below the lower control limit for the 


soldered wires indicates that the performance of the solderless wrapped 


wires Was superior on the average. 
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The effects of repeated bending combined with tension are shown in 
Fig. 16. In this test, the wire was kept under tension continuously while 
it was bent from its starting position perpendicular to the terminal axis 
to a second position parallel to the terminal axis and then back to its 
starting position. This cycle was repeated, always in the same direction 
from the starting position, until the wire broke. In Fig. 16, the +3e¢ 
control limits indicate again that the performance of the solderless 
wrapped wires was better, on the average, than the performance of the 
soldered wires. 

From the results of tests such as those which have been described, it 
has been concluded that the solderless wrapped connection satisfies the 
mechanical security objectives in part, but not completely. It is more 
vulnerable to loosening by axial pull on the wire than is the soldered 


connection, so this limitation should be recognized in authorizing appli- 


cations of solderless wrapping. In its ability to withstand vibration and 
repeated bending of the lead wire, however, the solderless wrapped 
connection appears to be fully as good as, and probably better than, the 
soldered connection. 


V. APPROVED COMBINATIONS OF TERMINALS AND WIRE 


Present approvals of specific combinations of terminals and wire are 
based very largely upon the results of the accelerated aging tests that 
. owe > a) 
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Results of vibration test on small equipment unit wired with local 
cable. Sample size 550 for each type of connection. 
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Fig. 16 Effects of repeated bending with wire under tension. Connections 
made with No. 24 gauge wire on single-wire terminals of wire-spring relay. Sample 
size 25 in each case. 


have been described, upon the ability of the manufacturer to manufac- 
ture the terminals economically and upon a limiting dimension of a 
wrapping bit that is used widely. 

The limiting dimension is the minimum diameter of the terminal hole 
in the bit used for wrapping No. 24 gauge wire. In order to be sure that 
No. 24 gauge wire can be wrapped on any approved terminal, the maxi- 
mum dimensions of the cross-section are limited so that the terminal 
will pass through a circular opening 0.073 inch in diameter. 

Solderless wrapping with No. 22 gauge tinned copper wire is approved 
on terminals of rectangular cross section whose nominal thickness is at 
least 0.030 inch and whose minimum diagonal exceeds 0.061 inch. 

Solderless wrapping with No. 24 gauge tinned copper wire is approved 
on (a) terminals of rectangular cross section whose nominal thickness is 
at least 0.025 inch and whose minimum diagonal exceeds 0.059 inch, 
(b) embossed terminals of the form shown in Fig. 6 punched and formed 
from flat stock whose nominal thickness is at least 0.010 but less than 
0.025 inch and (c) the wire-spring relay terminals in Figs. 7 and 8. 

Approved terminal materials include nickel silver, brass, phosphor 
bronze and silicon copper. The best terminal materials for solderless 
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wrapping have a high modulus of elasticity, a low rate of stress relaxa- 
tion and a temperature coefficient of expansion near that of the wire. 

In general, a copper flash plus an electrotinned finish is required on 
brass, phosphor bronze and silicon copper terminals, or they may be 
punched from either tin-coated or solder-coated stock. No finish is re- 
quired on nickel silver terminals, although any of the foregoing finishes 
is permissible and is specified in some cases to facilitate soldering. Solder 
dipping is not approved, because of the risk of obtaining abnormally 
thick coatings from time to time and the undesirable effect that this 
could have upon the stability of connections wrapped on such terminals. 

The use of untinned copper wire has been approved only in cases 
where the required service life of the connections is substantially less 
than 40 years. Furthermore, such approvals are limited to the heavy 
terminals which are approved for use with No. 22 gauge wire. 

There have been exceptions to some of the standards described in the 
preceding paragraphs. Solderless wrapping on solder-dipped terminals 
and on thin, flat terminals was approved on a limited basis early in the 
development program. Those connections, however, are in circuits 
where trouble, if it should occur, would be detected automatically, could 
be located quickly and could be corrected easily by soldering the defec- 
tive connection. For future production, those terminals are being brought 
into agreement with present standards. 


VI. PERFORMANCE IN SERVICE 


The first field trial of solderless wrapped connections was installed in 
1950, and limited use of solderless wrapping in regular manufacture 
began a few years later. During the period 1950 to 1958, the service 
performance of solderless wrapped connections appears to have been 
highly satisfactory. 

A two-year survey of 411,000 solderless wrapped connections in five 
central offices showed a lower wire breakage rate than would have been 
expected with soldered connections, and the difference was great enough 
to be considered statistically significant. 

There has been no report of solderless connections being pulled off 
of terminals in service or of being partially unwrapped, and inspection 
of about 20,000 connections in two central offices revealed no sign of 
partial unwrapping. 

The resistance variations of 135 connections were measured after two 
years in service, and 160 more were measured after three and one-half 
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years of service. The highest value of AR observed was 0.0004 ohm, so 
the value of m for the 295 connections still was zero. 


VII. CONCLUSION 


The laboratory studies and field experience to date have provided 
considerable assurance that properly designed and properly made solder- 
less wrapped connections will perform satisfactorily for 40 years in cen- 
tral office service. The use of solderless wrapping is being expanded, 
therefore, in the Bell System. Suitable terminals now are being provided 
on many types of telephone apparatus. Design changes have been au- 
thorized to provide suitable terminals on a number of additional types 
of apparatus, although the changes have not been introduced yet in 
manufacture. And design changes to provide suitable terminals on still 
other types of apparatus are being studied. 

Several hundred million solderless connections are being wrapped each 
year in the Bell System now. The number will grow, but it is difficult to 
predict what the saturation level will be. Inevitably, the solderless 
wrapped connection is in competition with soldered connections, clinched 
connections, welded connections and all the other types. In the end, the 
choice of connections for any particular application is likely to be an 
economic choice — based not only upon the cost of the labor involved 
in making a connection, but upon many other factors as well. The cost 
of modifying terminals, for example, is an obstacle to solderless wrap- 
ping on existing apparatus. In some cases, the cost of modifying the 
terminals of a particular type of apparatus outweighs the potential sav- 
ings of solderless wrapping. It is not likely, therefore, that solderless 
wrapping ever will displace other types of connections completely. 

The present program for central office equipment in the Bell System 
calls for modification of the terminals of existing types of apparatus in 
those cases where the preparation expense clearly is outweighed by the 
direct savings from solderless wrapping and the indirect savings from 
the use of plastic insulated wire, which is made practicable by solderless 
wrapping. In other cases, modification of terminals will be deferred until 
present manufacturing tools wear out and have to be replaced. 

In designing new types of apparatus — especially switching apparatus 

the trend is to provide terminals suitable for solderless wrapping at 
the start. Furthermore, every effort is made to arrange the terminals in 
modular arrays that will facilitate automatic wiring by machines. These 
two steps are opening a door to economical use of automatic wiring in 
manufacture. It seems quite probable, therefore, that telephone switch- 
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ing systems of tomorrow will be well populated with solderless wrapped 
connections. 
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